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Shannon  showed  that  it  was  possible  to  achieve  arbitrarily  low  error  rates  at  any  data  rate  less  than  channel  capacity.  By 
the  early  Sixties,  it  had  been  realized  that  the  real  problem  was  how  to  achieve  reasonable  error  rates  with  acceptable  decoding 
complexity  at  data  rates  anywhere  near  capacity.  The  author’s  research  has  been  primarily  motivated  by  this  problem  [l]-[22]. 
This  lecture  will  offer  an  account  of  some  of  his  adventures  in  this  pursuit,  and  some  preliminary  conclusions. 
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The  purpose  of  this  talk  is  twofold:  to  give  an  elementary  THE  SPACE  BELOW  HAS  BEEN  RESERVED  FOR 
and  concrete  introduction  to  symbolic  dynamics  and  to  discuss  YOUR  DOODLING  PLEASURE: 
two  applications  to  coding  problems. 

We  will  begin  with  a  brief  discussion  of  the  origins  of  sym¬ 
bolic  dynamics  going  back  to  the  work  of  Hadamard  in  1898. 

The  rough  idea  is  that  symbolic  dynamics  provides  a  model 
for  the  orbits  of  a  classical  dynamical  system  via  a  space  of 
sequences.  Next  we  will  introduce  the  basic  concepts  of  sym¬ 
bolic  dynamics,  emphasising  sliding  block  codes.  We  will  sur¬ 
vey  some  of  the  fundamental  problems,  solved  and  unsolved, 
in  the  subject.  Then  we  will  see  how  work  on  these  problems 
has  led  to  coding  applications  in  two  different  settings: 

1.  The  state  splitting  algorithm  for  constructing  en¬ 
coders/decoders  adapted  to  in  put -cons  trained  channels 
such  as  magnetic  and  optical  recording  channels. 

2.  An  analysis  of  a  class  of  spaces  with  homogeneity 
properties  that  naturally  generalize  convolutional  codes, 
group  codes  [3],  geometrically  uniform  codes  [2],  and  or¬ 
bit  systems  [6]. 

For  introductory  reading  on  symbolic  dynamics  and  its  ap¬ 
plications,  see  the  monograph  [1],  the  textbook  [4],  and  the 
article  [6](§IV).  For  a  tutorial  on  the  state  splitting  algorithm 
see  [5]. 
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Abstract  —  An  important  class  of  universal  encoders 
is  the  one  where  the  encoder  is  fed  with  two  inputs: 

a)  The  incoming  string  of  data  to  be  compressed. 

b)  A  “training  sequence”  that  consists  of  the  last  N 
data  symbols  that  have  been  processed  (i.e.  a  Sliding 
Window  algorithm). 

We  consider  Fixed-to- Variable  universal  encoders 
that  noiselessly  compress  blocks  of  some  fixed  length 
and  derive  universal  bounds  on  the  rate  of  approach 
of  the  compression  to  the  l-th  order  (per  letter)  en¬ 
tropy  H(X i)  or  to  the  smaller  conditional  entropy 
H( X*~k\X0_k+1)  as  a  function  of  £  and  of  the  length 
N  of  the  training  sequence  X?_N+1. 

We  describe  non-asymptotic  uniform  bounds  on  the  per¬ 
formance  of  data-compression  algorithms  in  cases  where  the 
length  N  of  the  training  sequence  (“history”)  that  is  avail¬ 
able  to  the  encoder  is  not  large  enough  so  as  to  yield  the 
ultimate  compression,  namely  the  entropy  of  the  source. 
Two  characteristic  ultimate  goals  are  considered:  The  l-th 
order  entropy  H(X i),  and  the  associated  conditional  en¬ 
tropy  H(^X^~~ The  bounds  are  based  on  classical 
information-theoretic  convexity  arguments.  Nevertheless,  it 
is  demonstrated  that  convexity  arguments  that  work  for  one 
case  are  totally  useless  for  the  other  and  vice  versa.  Fur¬ 
thermore,  these  classical  convexity  arguments,  when  properly 
used,  lead  to  efficient  universal  data  compression  algorithms 
for  each  of  the  two  cases.  For  the  sake  of  simplicity  we  confine 
our  attention  to  binary  stationary  ergodic  sources. 

The  first  case  to  be  considered  is  the  one  where  we  would 
like  to  find  an  upper  bound  on  the  length  of  a  training  se¬ 
quence  needed  in  order  to  guarantee  that  any  source  in  the 
given  class  will  yield  a  compression  close  to  its  /-th  order  en¬ 
tropy  He,  and  to  derive  a  uniform  bound  on  the  rate  of  ap¬ 
proach  to  this  entropy  as  a  function  of  £  and  N. 

“Intuition”  tells  us  to  use  the  “plug-in”  method:  namely, 
given  a  training  sequence  of  length  N,  find  the  relative  fre¬ 
quency  Q(Xi)£  of  all  /-vectors  in  it  .  Find  the  appropriate 
Huffman  code  and  use  it  to  encode  the  incoming  /-blocks.  The 
expected  compression  will  be  -ElogQ(Xi).  Clearly,  by  con¬ 
vexity,  -Elog  Q(Xi)  >  £H(X{)  and  eventually  converges  to 
it.  Alas,  the  convergence  is  not  uniform! 

Let  the  training  sequence  be  denoted  by  -X^+i  and  let: 
N(X°_n+1\X^) -smallest  i  >  1  such  that  Xe_~'  =  X[.  If 

i  This  work  was  supported  by  the  Technion  Fund  for  the  Promo¬ 
tion  of  Research 


no  such  i  can  be  found,  N(X^_ N^.1  \X{)  =  N.  It  then  fol¬ 
lows  from  Kac’s  Lemma  [1]  that  there  exists  a  universal  algo¬ 
rithm  (a  variant  of  the  LZ  algorithm  )  with  a  length  function 
L(X{ |X°jv+i)  which  is  roughly  equal  to  logN(X°_N+1\X() 
when  AT(X°jyr+1  \X()  /  N  or  to  l  otherwise,  such  that 
EL{X{ |Xijv+i)  <£[H{Xt)  +  0(\og£/l)  +  8  +  2-61]  where  8  is 
some  arbitrarily  small  positive  number.  This  uniform  bound 
holds  if  N  >  where  B  satisfies:  P[X{  :  X{  <  2~Bi]  < 

8 . 

But  why  be  satisfied  with  achieving  H(Xf)  and  not  try  to 
aim  at  some  smaller  conditional  entropy  where  the  condition¬ 
ing  is  on  some  suffix  of  the  training  sequence 

Our  second  goal  is  to  achieve  a  universal  compression  that 
is  close  to  H(X*~k |X°*._|_i)  where  1  <  k  <  £  —  1.  It  is  now 
assumed  that  a  certain  mixing  condition  is  satisfied  [2].  By 
Kac’s  lemma  [1]  and  by  convexity,  (£  -  k)H(X*~k |Xifc+1) 
>  Flog  N(X°_ N+1|Xi];*1)  -  kH(X°_w)  > 

Flog  Afc^IXi-^J+Flog 

=Elog  N+1  where 

n(Xl£+1|X°fc+1)  is  the  number  of  occurrences  of  an  index  i; 
i=k,k+l,...iV  such  that  =  X°_k +1  (i.e.  a  “plug-in” 

method!).  Clearly,  since  fc+1  is  a  suffix  of  the  training  se¬ 
quence  it  is  available  to  both  the  encoder  and  the  decoder 
prior  to  the  processing  of  X[~k . 

Thus,  the  existence  of  a  simple  universal  encoding  algo¬ 
rithm  that  can  uniformly  approximate  the  lower  bound  on  the 
conditional  entropy  that  is  derived  above  follows  immediately. 

A  conditional  version  of  the  Kac’s  Lemma  leads  to  yet  an¬ 
other  algorithm  (  a  conditional  LZ  variant)  that  applies  to 
all  finite  alphabet  ergodic  sources.  [3].  It  is  demonstrated  in 
[3]  that  in  a  sense,  this  algorithm  is  efficient  in  that  no  other 
universal  data  compression  algorithm  can  do  better,  when  the 
length  of  the  training  sequence  is  bounded  by  N  (for  large  N). 

References 

[1]  A.  D.  Wyner  and  J.  Ziv,  ‘The  Sliding-Window  Lempel-Ziv  Al¬ 
gorithm  is  Asymptotically  Optimal”  (Invited  paper),  Proceed¬ 
ing  of  the  IEEE,  Vol.  82  ,  June  1994,  pp.  872-877. 

[2]  A.  D.  Wyner  and  J.  Ziv, “Classification  with  Fini te- Memory” , 
submitted  to  the  IEEE  Transactions  on  Information  Theory. 

[3]  Y.  Hershkovits  and  J.  Ziv,  “On  Sliding- Window  Universal  Data 
Compression  with  Limited-Memory”,  submitted  to  the  IEEE 
Transactions  on  Information  Theory. 


3 


Quantum  Information  Theory 

Gilles  Brassard 1 

Departement  IRO,  Universite  de  Montreal 
C.P.  6128,  Succ.  Centre-Ville,  Montreal  (Quebec),  Canada  H3C  3J7 


Abstract  —  Quantum  information  theory  is  at 
the  confluent  of  computer  science  and  quantum 
mechanics.  We  survey  some  of  the  most  striking 
recent  developments  in  the  field. 

I.  Introduction:  the  Qubit 

Classical  and  quantum  information  are  very  different. 
Classical  information  can  be  read,  copied,  and  transcribed 
into  any  medium;  it  can  be  transmitted  and  broadcast,  but  it 
cannot  travel  faster  than  light.  Quantum  information  cannot 
be  read  or  copied  without  disturbance,  but  in  some  instances 
it  appears  to  propagate  instantaneously  or  even  backward  in 
time.  Together  the  two  kinds  of  information  can  perform  feats 
that  neither  could  achieve  alone.  For  more  details,  references, 
and  appropriate  credit  to  the  many  researchers  who  made  this 
work  possible,  please  refer  to  my  full  paper  in  Current  Trends 
in  Computer  Science ,  Jan  van  Leeuwen  (Editor),  Lecture 
Notes  in  Computer  Science,  Volume  1000  (special  anniversary 
volume),  Springer- Verlag,  1995. 

Quantum  information  theory  has  the  potential  to  bring 
about  a  spectacular  revolution  in  computer  science.  Even 
though  current-day  computers  use  quantum-mechanical 
effects  in  their  operation,  for  example  through  the  use  of  tran¬ 
sistors,  they  are  still  very  much  classical  computing  devices. 
A  supercomputer  is  not  fundamentally  different  from  a  purely 
mechanical  computer  that  could  be  built  around  simple  relays: 
their  operation  can  be  described  purely  in  terms  of  classical 
physics  and  they  can  simulate  one  another  in  a  straightfor¬ 
ward  manner,  given  sufficient  storage.  By  contrast,  com¬ 
puters  could  in  principle  be  built  to  profit  from  genuine 
quantum  phenomena  that  have  no  classical  analogue,  some¬ 
times  providing  exponential  speed-up  compared  to  classical 
computers.  Quantum  information  is  also  at  the  core  of  other 
phenomena  that  would  be  impossible  to  achieve  in  a  purely 
classical  world,  such  as  unconditionally  secure  distribution  of 
secret  cryptographic  material. 

At  the  heart  of  it  all  is  the  quantum  bit,  or  qubit.  In  classi¬ 
cal  information  theory,  a  bit  can  take  either  value  0  or  value  1. 
According  to  quantum  information  theory,  a  qubit  can  be  in 
linear  superposition  of  the  two  classical  states,  with  complex 
coefficients.  It  is  best  visualized  as  a  point  on  the  surface  of 
a  unit  sphere  whose  North  and  South  poles  correspond  to  the 
classical  values.  (This  is  not  at  all  the  same  as  taking  a  value 
between  0  and  1  as  in  classical  analogue  computing.)  In  gen¬ 
eral,  qubits  cannot  be  measured  reliably:  not  more  than  one 
classical  bit  of  information  can  be  extracted  from  any  given 
qubit  and  the  more  information  you  obtain  about  it,  the  more 
you  disturb  it  irreversibly.  As  an  example  of  how  quantum 
information  differs  from  classical  information,  it  is  possible  in 
some  situations  to  extract  more  than  twice  as  much  informa¬ 
tion  from  two  identical  qubits  than  from  either  one  alone. 
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II.  Quantum  Cryptography 

The  impossibility  to  measure  quantum  information  reliably 
is  at  the  core  of  quantum  cryptography.  When  information  is 
encoded  with  non-orthogonal  quantum  states,  any  attempt 
from  an  eavesdropper  to  access  it  necessarily  entails  a  proba¬ 
bility  of  spoiling  it  irreversibly,  which  can  be  detected  by  the 
legitimate  users.  This  phenomenon  can  be  exploited  to  imple¬ 
ment  a  key  distribution  system  that  is  provably  secure  even 
against  an  eavesdropper  with  unlimited  computing  power. 
Several  prototypes  have  been  built,  including  one  that  is  fully 
operational  over  30  kilometres  of  ordinary  optical  fibre.  Fur¬ 
ther  experiments  are  currently  under  way  across  the  lake  of 
Geneva.  Quantum  techniques  may  also  assist  in  the  achieve¬ 
ment  of  subtler  cryptographic  goals,  such  as  protecting  private 
information  while  it  is  being  used  to  reach  public  decisions. 

III.  Quantum  Computing 

Independent  qubits  are  sufficient  to  produce  nontrivial 
cryptographic  phenomena,  but  they  are  not  very  interesting 
for  computational  purposes.  For  this,  we  must  consider  quan¬ 
tum  registers  composed  of  n  qubits.  Such  registers  can  be  in 
an  arbitrary  superposition  of  all  2”  classical  states.  In  prin¬ 
ciple,  a  quantum  computer  can  be  programmed  so  that  expo¬ 
nentially  many  computation  paths  are  taken  simultaneously  in 
a  single  piece  of  hardware,  a  phenomenon  known  as  quantum 
parallelism.  What  makes  this  so  powerful — and  mysterious — 
is  the  exploitation  of  constructive  and  destructive  interference, 
which  allows  for  the  reinforcement  of  the  probability  of  obtain¬ 
ing  desired  results  while  at  the  same  time  the  probability  of 
spurious  results  is  reduced  or  even  annihilated.  The  most  fa¬ 
mous  example  of  quantum  computation  allows  in  principle  for 
the  quick  factorization  of  large  integers  on  a  quantum  com¬ 
puter,  which  has  dramatic  cryptographic  significance. 

IV.  Quantum  Teleportation 

Even  though  quantum  information  cannot  be  measured  in 
general,  it  can  be  teleported  from  one  place  to  another.  It  is 
possible  for  two  spatially  separated  qubits  to  be  entangled ,  in 
the  sense  that  each  of  them  behaves  randomly  when  measured, 
but  they  always  give  opposite  results  to  the  same  measure¬ 
ment.  Let  Alice  and  Bob  share  such  a  pair.  If  Alice  makes 
her  mystery  particle  interact  in  the  proper  way  with  her  share 
of  the  pair,  Bob’s  share  will  instantaneously  become  a  replica 
of  the  mystery  particle  up  to  rotation;  at  the  same  time  Alice’s 
mystery  particle  loses  its  information  but  she  learns  which  ro¬ 
tation  Bob  must  perform  on  his  replica  to  match  the  original. 
Imperfect  stores  of  nonlocal  qubit  pairs  can  be  purified  by 
local  transformations  and  exchange  of  classical  information. 
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Wavelets  have  emerged  in  the  last  decade  as  a  synthesis  from  many  disciplines,  ranging  from  pure  mathematics  (where 
forerunners  were  used  to  study  singular  integral  operators)  to  electrical  engineering  (quadrature  mirror  filters),  borrowing  in 
passing  from  quantum  physics,  from  geophysics  and  from  computer  aided  design. 

The  first  part  of  the  talk  will  present  an  overview  of  the  ideas  in  wavelet  theory,  and  show  how  it  fits  into  the  different 
disciplines  in  which  it  is  rooted.  The  second  part  of  the  talk  will  discuss  some  recent  applications,  such  as,  in  particular,  a 
nonlinear  ” squeezing’  of  the  wavelet  transform,  inspired  by  auditory  models,  with  applications  to  speech  processing;  and  a 
discussion  of  nonlinear  approximation  and  why  wavelets  are  so  succesful  in  nonlinear  approximation. 
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Abstract  —  The  problem  of  minimizing  a  functional 
over  a  convex  set  of  non-negative  functions  is  con¬ 
sidered,  when  the  functional  to  be  minimized  is  an 
/-entropy,  or  /-divergence  resp.  Bregman  distance 
from  a  given  function. 

I.  Motivation 

The  motivation  for  this  paper  is  the  problem  of  inferring  a 
function  p(x)  on  a  set  X  when  the  only  available  information 
is  p  G  E,  where  E  is  a  known  convex  set  of  functions  on  X. 
Possibly  a  prior  guess  q  is  also  available,  namely  p(x)  —  q(x) 
would  be  inferred  were  q  G  E.  A  familiar  method  is  to  take 
that  p  G  E  which  minimizes  a  certain  functional. 

II.  Maximum-entropy  type  methods 
For  inferring  non-negative  functions,  it  is  usual  to  minimize 
one  of  the  functionals 

JJ(P)  =  J  f(p)dn,  Df(p,q)  =  J  qffydp,  (1) 
Bf(p,q)  =  J[f(p)-f{q)-f'(q)(p-q)]dp,  (2) 

called  /-entropy,  /-divergence  and  Bregman  distance,  respec¬ 
tively.  Here  /  is  a  strictly  convex  differentiable  function  on 
R+  and  p  is  a  ^-finite  measure  on  X.  Bf(p ,  q)  is  a  distance  in 
the  sense  that  it  is  non-negative  and  equals  0  iff  p  =  q  [p]. 
AfOb?)  is  a-1 so  a  distance  if  /( 1)  =  /'(l)  =  0. 

The  choice  fi(t)  =  tlogt  —  t  +  1  gives  the  method  of  max¬ 
imum  entropy  or  ME  (Jf1(p)  for  a  probability  density  p  is 
negative  Shannon  entropy,  and  Df1  =  Bf1  is  Kullback-Leibler 
/-divergence).  Other  familiar  choices  are  fo(t)  =  —  log  <+t+l, 
leading  to  Burg’s  method  and  to  minimizing  reversed  /- 
divergence,  and  fa(t)  =  [iQ  —  at  +  a  —  1]  sign  (or  —  1),  a  >  0. 
There  are  strong  arguments,  both  probabilistic  and  axiomatic, 
that  support  ME,  cf.  [1],  [3].  For  axiomatic  justifications  of 
alternative  methods  with  some  other  /  cf.  [l],  [4].  A  proba¬ 
bilistic  justification  of  these  methods  can  be  given  by  an  ex¬ 
tension  of  ME  [2]  in  the  case  when  /  can  be  represented  as  the 
convex  conjugate  of  the  log  of  the  moment  generating  func¬ 
tion  of  a  non-negative  valued  random  variable.  Among  the 
functions  fa  above,  those  with  0  <  a  <  1  have  this  property. 

III.  Main  results 

Theorem  1:  Let  E  be  a  convex  set  of  non-negative  functions 
such  that  the  infimum  for  p  G  E  of  J/(p),  T)f(p,  q)  or  Bj(p,  q) 
is  finite.  Then  each  sequence  {pn}  C  E  approaching  this 
infimum  converges  to  a  function  p*  in  the  sense  of  convergence 
in  measure  on  every  set  with  finite  p-measure,  providing  in  the 
case  of  Df  that  either  q  >  0  [p]  or 

lim  /'(<)  =  oo  .  (3) 


1This  work  was  supported  by  the  Hungarian  National  Founda¬ 
tion  for  Scientific  Research,  Grant  1906 


Moreover,  the  difference  of  Jf(p)  resp.  jB/(p,  q)  from  its  infi¬ 
mum  is  lower  bounded  by  Bf(p,p*),  for  every  p  6  E. 

Notice  that  here  p*  does  not  necessarily  belong  to  E.  The 
minimum  of  the  considered  functional  over  E  is  attained  iff 
p*  G  E.  If  p*  £  E ,  it  is  considered  a  generalized  solution 
of  the  minimization  problem  or  (in  the  case  of  Df  or  Bf)  a 
generalized  projection  of  q  onto  E. 

Theorem  2:  The  statement  of  Theorem  1  can  be  strength¬ 
ened  to  convergence  in  L\{p)  norm 

(a)  for  Jj,  if  p  is  a  finite  measure  and  (3)  holds, 

(b)  for  Df,  if  q  G  Li(p)  and  (3)  holds, 

(c)  for  Bf ,  if  p  is  a  finite  measure,  q  G  £i(p),  and 

inf (f'(Kv)  —  f*{v))  >  0  for  some  K  >  1  .  (4) 

V>1 

Corollary:  Under  the  conditions  in  Theorem  2,  the  £i(p) 
closedness  of  E  is  a  sufficient  condition  for  p*  G  E ,  i.e.,  for  the 
existence  of  a  (unique)  solution  of  the  minimization  problem. 

Remark:  (4)  is  a  stronger  hypothesis  than  (3),  but  for  the 
functions  fa  either  holds  iff  a  >  1.  When  (3)  is  not  satisfied, 
no  good  sufficient  conditions  are  available  for  p*  G  E. 

In  most  applications,  the  feasible  set  E  is  defined  by  linear 
constraints, 

E  =  {p:  J  ay(x)p(x)p(dx)  =  7  6  H  .  (5) 

Then,  by  the  above  Corollary,  under  the  hypotheses  of  The¬ 
orem  2  the  boundedness  of  the  functions  a7  is  a  sufficient 
condition  for  p*  G  E.  For  the  functionals  (1),  a  somewhat 
weaker  sufficient  condition  is  given  in 

Theorem  3:  Under  the  hypotheses  of  Theorem  2  (a)  or  (b), 
the  finiteness  of  f  /*(A|a7|)dp  or  f  f*(X\a^\)qdp  for  every 
A  >  0  and  7  G  T  is  sufficient  for  p*  G  E.  Here  f*  denotes  the 
convex  conjugate  of  /. 
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Abstract  —  Parallel  independent  channels  where 
no  encoding  is  allowed  for  one  of  the  channels  are 
studied.  The  Slepian-Wolf  theorem  on  source  cod¬ 
ing  of  correlated  sources  is  used  to  show  that  any 
information  source  whose  entropy  rate  is  below  the 
sum  of  the  capacity  of  the  coded  channel  and  the  in¬ 
put/output  mutual  information  of  the  uncoded  chan¬ 
nel  is  transmissible  with  arbitrary  reliability.  The 
converse  is  also  shown.  Thus,  coding  of  the  side  in¬ 
formation  channel  is  unnecessary  when  its  mutual  in¬ 
formation  is  maximized  by  the  source  distribution. 
An  information-theoretic  interpretation  of  Parallel- 
Concatenated  channel  codes  and,  in  particular,  Turbo 
codes  is  put  forth. 

I.  Model 

Consider  the  model  depicted  in  the  Figure  1  where  two  inde¬ 
pendent  channels  operate  in  parallel.  If  the  inputs  to  both 
channels  were  allowed  to  be  encoded,  then  Shannon’s  cod¬ 
ing  theorem  tells  us  that  the  source  is  reliably  transmissible 
provided  its  entropy  rate  is  below  the  sum  C\  +C2  of  the  chan¬ 
nel  capacities;  conversely,  if  the  source  entropy  rate  exceeds 
Ci  +  C2  then  reliable  transmission  is  not  possible.  The  new 
twist  in  the  model  in  Figure  1  is  that  the  information  going 
through  channel  2  is  not  encoded.  The  following  practical  sce¬ 
narios  which  fit  into  this  model  are  studied  in  this  paper:  an 
existing  uncoded  communication  link  is  to  be  upgraded  with 
the  addition  of  a  coded  channel  in  order  to  provide  reliable 
transmission;  the  receiver  obtains  a  noisy  version  of  the  raw 
data  in  addition  to  the  coded  channel  output;  a  single  channel 
time- multiplexed  into  several  independent  subchannels. 


Fig.  1:  Channel  with  Uncoded  Side  Information 

II.  Coding  Theorem 

Our  main  result  states  that  the  source  can  be  transmitted  reli¬ 
ably  provided  that  its  conditional  entropy  rate  given  the  output 
of  the  uncoded  channel,  H(X\Z),  is  below  the  capacity  Ci  of 
channel  1,  and,  conversely,  it  cannot  be  transmitted  reliably  if 
the  conditional  entropy  rate  exceeds  C\ . 

This  result  suggests  that  we  view  the  information  rate 
of  the  source  as  split  into  two  nonoverlapping  components, 
tf(X)  =  H(X |Z)  +  J(X;Z).  Even  though  the  information 
quantified  by  the  second  term  is  transmitted  uncoded,  the 
source  is  reproducible  with  arbitrary  reliability  at  the  output. 
If,  furthermore,  the  source  is  matched  to  the  uncoded  channel 
in  the  sense  that  it  maximizes  its  input/output  mutual  in¬ 
formation,  then  it  is  possible  to  transmit  information  at  rate 


Ci  +  C2  even  though  no  coding  is  provided  for  the  information 
going  through  one  of  the  channels.  This  implies  that  the  sum 
of  the  capacities  of  K  independent  parallel  binary  symmetric 
channels  can  be  achieved  even  if  only  one  of  them  is  encoded. 
This  observation  is  most  striking  when  the  encoded  BSC  has 
very  poor  crossover  probability. 

Our  coding  theorem  is  proved  under  very  mild  conditions 
on  the  channels  and  the  source.  The  source  and  the  out¬ 
put  of  the  uncoded  channel  are  assumed  to  be  jointly  er- 
godic/stationary  and  the  coded  channel  is  assumed  to  be  such 
that  its  capacity  is  equal  to  the  limit  of  maximal  mutual  in¬ 
formations. 

To  prove  the  converse  part  of  the  result  we  show  that  even 
if  the  encoder  were  to  observe  the  output  of  the  uncoded  chan¬ 
nel,  it  would  not  be  possible  to  send  information  at  a  faster 
rate.  The  proof  of  the  achievability  part  is  by  construction  of 
an  encoder  where  the  source  coding  and  channel  coding  op¬ 
erations  are  performed  separately.  The  source  encoder  does 
not  operate  at  the  full  entropy  rate  of  the  source.  Rather  it 
is  a  Slepian-Wolf  encoder  [1]  operating  at  rate  fT(X|Z).  In 
the  special  case  of  binary-input  memoryless  channels,  optimal 
encoding  is  possible  by  restricting  attention  to  linear  codes. 

III.  Parallel-Concatenated  Codes 

Parallel-Concatenated  codes,  and  in  particular  Turbo  codes 
[2],  exhibit  favorable  complexity /performance  tradeoffs.  They 
can  be  cast  within  the  model  of  this  paper  by  considering  a 
single-channel  time-multiplexed  into  several  independent  sub¬ 
channels.  For  example,  one  subchannel  transmits  the  uncoded 
raw  data  (the  Turbo  codes  are  systematic),  and  two  parallel 
channels  are  driven  by  partial  encoders  which  can  be  viewed  as 
joint  source-channel  encoders  driven  by  a  redundant  source. 
A  practically  appealing  way  to  ensure  that  the  information  en¬ 
coded  by  the  partial  encoders  is  nonoverlapping  is  by  prepend¬ 
ing  a  sufficiently  long  interleaver  at  the  input  of  one  of  the  en¬ 
coders.  This  setup  is  more  attractive  than  simply  multiplexing 
the  source  because  of  the  complexity  reductions  of  combined 
source/channel  coding  with  high  compression  ratios.  Good 
component  codes  in  Parallel-Concatenated  schemes  are  able 
to  trade  to  some  extent  the  traditional  role  of  reducing  the 
uncertainty  of  the  source  given  the  channel  outputs  for  the 
easier  goal  of  preserving  mutual  information. 

Acknowledgements 

This  work  was  supported  in  part  by  a  grant  from  the  U.S.- 
Israel  Binational  Science  Foundation 

References 

[1]  D.  Slepian  and  J.  K.  Wolf,  Noiseless  Coding  of  correlated  infor¬ 
mation  sources,  IEEE  Trans.  Information  Theory ,  IT-19,  pp. 
471-480,  July  1973 

[2]  C.  Berrou,  A.  Glavieux  and  P.  Thitimajshima,  Near  Shannon 
Limit  Error  Correcting  Coding  and  Decoding:  Turbo-Codes, 
Proc.  International  Conference  on  Communications,  Geneva, 
Switzerland,  pp.  1064-1070,  May  23-26,  1993 


7 


Zero-Error  List  Capacities  of  Discrete  Memoryless  Channels 

i.  Emre  Telatar 

Room  2C-174,  AT&T  Bell  Laboratories,  Murray  Hill,  NJ  07974,  USA 
telatar@research.att.com 


Abstract  —  We  define  zero-error  list  capacities  for 
discrete  memoryless  channels.  We  find  lower  bounds 
to,  and  a  characterization  of,  these  capacities.  As 
is  usual  for  such  zero-error  problems  in  information 
theory,  the  characterization  is  not  generally  a  single¬ 
letter  one.  Nonetheless,  we  exhibit  a  class  of  channels 
for  which  a  single  letter  characterization  exists.  We 
also  show  how  the  computational  cutoff  rate  relates 
to  the  capacities  we  have  defined. 

I.  Introduction 

It  is  sometimes  desirable  that  the  decoder  of  a  communi¬ 
cation  system  declare  not  just  one,  but  several  estimates  of 
the  transmitted  data.  For  example,  the  encoder  and  the  de¬ 
coder  may  be  the  inner  code  of  a  more  complex  transmission 
system,  the  structure  of  the  outer  code  can  then  be  used  to 
choose  among  the  estimates  the  inner  code  provides.  A  de¬ 
coder  that  may  produce  more  than  one  estimate  is  called  a 
list  decoder. 

Suppose  we  are  given  a  discrete  memoryless  channel  (DMC) 
with  input  alphabet  X,  output  alphabet  y  and  transition 
probabilities  {P(y\x),y  6  y,  x  £  X}. 

Let  C  be  a  block  code  of  length  n  for  P.  A  zero-error  list 
decoder  for  C  is  a  decoder  that  assigns  to  every  output  y  £  yn 
the  set  of  codewords  £(y,C)  C  C  that  could  have  produced 
that  output  with  positive  probability:  C(y,C)  =  {c  6  C  : 
Pn(y\c)  >  0}.  Let  L(y,C)  =  \£(y,C)\  be  the  size  of  the  list. 
The  uniform  distribution  on  C  induces  a  distribution  on  L, 
and  we  will  be  interested  in  the  moments  of  L. 

For  any  p  >  0  and  P  define  the  zero- error  pth -moment  list 
capacity  Co*(p,  P)  as  the  largest  rate  R  such  that  for  all  e  >  0 
there  exists  a  code  of  rate  at  least  R  for  which  the  pth  moment 
of  the  list  size  is  at  most  1  +  e. 

II.  Summary  of  Results 

To  state  our  results,  we  introduce 

Po(p,  P)  =  max  min  pI(Q,W)  +  D(V\\P\Q), 
q  v,w-.v,w<p  1 

WQ=VQ 

where  Q  ranges  over  the  distributions  on  the  input  alphabet 
of  P,  D{V\\P\Q)  is  the  conditional  informational  divergence 
and  I(Q,W)  is  the  mutual  information.  In  the  minimization 

V  and  W  range  over  the  set  of  channels  with  the  same  input 
and  output  alphabets  as  P,  the  notation  VQ  =  WQ  means 
that  the  distribution  on  the  output  alphabet  of  the  channels 

V  and  W  are  the  same  when  their  inputs  have  distribution  Q , 
and  K<P  means  that  V(j\k)  =  0  whenever  P(j\k)  =  0. 

Theorem  1  For  all  p  >  0,  Co*(p,  P)  >  Fo(p,  P)/p.  More¬ 
over,  if  we  compute  the  lower  bound  for  Pn ,  normalize,  and 
pass  to  the  hm'it ,  Coti^p,  -P)  —  limn  — *oo  n~1Fo{p,  Pn)/p. 

The  case  of  p  =  1  is  of  particular  interest;  the  correspond¬ 
ing  capacity  Co/(l,  P)  is  called  the  zero-error  average  list  size 


capacity.  The  substitution  p  =  1  in  Theorem  1  recovers  the 
results  of  [1]. 

Another  special  case  is  obtained  by  letting  p  become  van¬ 
ishingly  small.  The  constraint  on  the  pth  moment  of  the  list 
size  is  then  equivalent  to  demanding  that  Pr[L  >  1]  gets  ar¬ 
bitrarily  small.  Taking  the  limit  as  p  — ►  0  in  Theorem  1  we 
recover  the  previously  known  lower  bound  for  zero-undetected- 
error  capacity  Cqu  [1,  2,  3]. 

As  a  further  special  case,  consider  the  limit  CW(oo,P)  = 
lim^— +OQ  Cot(p,  P). 

Theorem  2  Coi(oo,P)  =  min  C(W),  where  C(W)  = 

W:W<£P  v  7 

maxQ  I(Q,W)  is  the  ordinary  capacity  of  a  discrete  memo¬ 
ryless  channel  W . 

We  have  thus  seen  that  Co*(oo,  P)  has  a  single  letter  char¬ 
acterization.  A  more  surprising  result  is  that  for  a  special 
class  of  DMCs  one  can  obtain  a  single  letter  expression  for 
CW(p,  P)  for  any  p  >  0: 

Theorem  3  Given  a  DMC  P  with  input  alphabet  X  and  out¬ 
put  alphabet  y,  construct  the  bipartite  graph  G(P)  with  ver¬ 
tices  XU  y  and  edges  {(x,y)  :  x  £  X,  y  €  y,  P(y\x)  > 
0}.  If  G(P)  is  acyclic  then  Cot(p,  P)  =  Eo(p,P)/p,  where 
Eo(p,  P)  =  maxq  -  In  Y,y  [£x  Q(x)P(y\x)1/(-1+p) ]  1+". 

The  quantity  Eo(p)/p  is  the  largest  rate  for  which  the  pth 
moment  of  the  number  of  computations  per  symbol  remains 
bounded  in  sequential  decoding  [4,  5].  Theorem  3  is  similar 
to  the  result  of  [6]  where  it  is  shown  that  for  the  same  class 
of  channels  the  zero-undetected-error  capacity  Cqu  is  identical 
to  the  ordinary  capacity  C. 
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Abstract  —  We  answer  the  question,  what  should 
we  say  about  V  when  we  want  to  gamble  on  X ,  and 
what  is  it  worth?  If  V  =  X,  we  show  that  every  bit  of 
description  at  rate  R  is  worth  a  bit  of  increase  A (R) 
in  the  doubling  rate.  Thus  the  efficiency  A {R)/R  is 
equal  to  1.  For  general  V,  we  provide  a  single  letter 
characterization  for  A (R).  When  applied  specifically 
to  jointly  normal  (V,  X)  with  correlation  p,  we  find  the 
initial  efficiency  A  fO)  is  p2.  If  V  and  X  are  Bernoulli 
random  variables  connected  by  a  binary  symmetric 
channel  with  parameter  p,  the  initial  efficiency  is  (1  - 

2p)L>. 

We  finally  show  how  much  increase  in  doubling  rate 
is  possible  when  the  sender  can  provide  R  bits  of  in¬ 
formation  about  V  and  side  information  S  is  available 
only  to  the  investor. 

Summary 

Suppose  we  are  interested  in  gambling  on  the  outcome  of  a 
random  variable  X.  The  gambling  consists  of  betting  a  pro¬ 
portion  of  wealth  b{x)  on  the  outcome  x.  We  would  like  to 
maximize  the  doubling  rate,  which  is  the  growth  rate  of  wealth 
when  the  gambler  uses  a  fixed  betting  strategy  on  independent 
realizations  of  X.  It  is  well  known  that  Kelly  gambling,  which 
is  to  bet  in  proportion  to  the  probability  mass  function  of  X , 
is  optimal. 

Now  suppose  we  allow  a  description  of  X  at  rate  R  bits 
per  symbol.  Let  A (R)  be  the  maximum  increase  in  the  dou¬ 
bling  rate  of  wealth  for  transmission  rate  of  R.  We  prove  that 
A  (R)  —  R.  Any  bit  of  information  one  sends  about  X  is  worth 
a  bit  of  increase  in  the  doubling  rate. 

We  next  consider  the  effectiveness  of  sending  information 
when  side  information  S  is  available  to  the  investor  but  not 
to  the  encoder.  The  gambler  combines  this  side  information 
with  the  partial  description  of  X  to  form  the  bet. 

Theorem  1  If  X  is  described  at  rate  R,  and  side  information 
S  is  available  to  the  gambler,  then, 

A  (R)  =  R. 

We  ask  what  information  should  be  given  about  a  corre¬ 
lated  random  variable  V  if  we  want  to  help  the  investor  gam¬ 
ble  on  X.  This  problem  shows  some  similarities  to  source 
coding  with  side  information  [4,  1].  The  encoder  sends  R  bits 
about  V  and  the  investor  uses  this  information  to  gamble  on 
X.  Here  maximal  efficiency  is  not  generally  possible. 

Theorem  2  When  the  encoder  observes  V  correlated  with  X , 
A  (R)  =  max  I(V\X). 

p(v\v,x):  I(V-,V)<R,  v^v^x 


lThis  work  was  supported  by  NSF  Grant  NCR-9205663,  ARPA 
Contract  J-FBI-94-218  and  JSEP  Contract  DAAH04-94-G-0058. 


We  establish  certain  properties  of  A  (I?)  using  entropy  max¬ 
imization  results  from  Witsenhausen  and  Wyner  [3]. 

Next,  we  find  the  increase  in  the  doubling  rate  when  the 
encoder  sends  information  at  rate  R  about  a  correlated  ran¬ 
dom  variable  V  with  side  information  S  present  only  at  the 
investor.  The  investor  uses  these  R  bits  together  with  the  side 
information  S  to  invest  in  the  outcome  of  X. 

Theorem  3  When  the  encoder  observes  V,  and  side  infor¬ 
mation  S  is  available  at  the  investor, 

A(R)  =  max  I(V]X\S) 

p(€|u,*,a):  I(V\V\S)<R,  V-»V-hX,S) 

Finally,  we  investigate  the  efficiency  of  descriptions  based 
on  correlated  variables.  If  X  and  V  are  both  Bernoulli  |)  and 
are  associated  by  a  binary  symmetric  channel  with  crossover 
probability  p ,  it  can  be  shown  that  A (R)  has  a  derivative  of 
(1-2 p)2  at  R  =  0.  Thus,  even  the  most  effective  description 
of  V  relative  to  the  investment  in  X  pays  off  at  the  rate  of 
only  (1  —  2p)2  bits  of  doubling  per  bit  of  description. 

Now  suppose  that  V  and  X  are  jointly  Gaussian  with  cor¬ 
relation  p.  In  this  case  the  initial  efficiency,  A  (0),  is  equal  to 
P2- 

The  functional  form  of  A  (R)  for  binary  and  Gaussian  ran¬ 
dom  variables  will  be  developed  in  [2].  Also,  the  relationship 
between  the  derivative  of  A  (R)  at  R  —  0  and  the  Renyi  max¬ 
imal  correlation  of  V  and  X  will  be  investigated. 
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Abstract  —  In  some  applications,  channel  noise  is 
the  snm  of  a  Gaussian  noise  and  a  relatively  weak 
non- Gaussian  contaminating  noise.  Although  the  ca¬ 
pacity  of  such  channels  cannot  be  evaluated  in  general, 
we  analyze  the  decrease  in  capacity,  or  sensitivity  of 
the  channel  capacity  to  the  weak  contaminating  noise. 
We  show  that  for  a  very  large  class  of  contaminating 
noise  processes,  explicit  expressions  for  the  sensitivity 
of  a  discrete-time  channel  capacity  do  exist.  Sensitiv¬ 
ity  is  shown  to  depend  on  the  contaminating  process 
distribution  only  through  its  autocorrelation  function 
and  so  it  coincides  with  the  sensitivity  with  respect  to 
a  Gaussian  contaminating  noise  with  the  same  auto¬ 
correlation  function.  A  key  result  is  a  formula  for  the 
derivative  of  the  water-filling  capacity  with  respect  to 
the  contaminating  noise  power. 

Parallel  results  for  the  sensitivity  of  rate-distortion 
function  relative  to  a  mean-square-error  criterion  of 
almost  Gaussian  processes  are  obtained. 


I.  Sensitivity  of  Channel  Capacity 

Consider  a  discrete-time  stationary  channel: 

Yj  =  Xj  +  Nj  +  eZj  (1) 


We  assume  that  the  random  sequences  X  =  N  =  {Xj } 

and  Z  =  { Zj }  are  second-order  and  mutually  independent. 
The  nominal  noise  N  is  Gaussian,  E Nj  =  E Zj  =  0,  E Nf  = 
a2 ,  E Z]  =  1.  Denote  by  CP(9)  the  capacity  of  channel  (1) 
under  the  assumption  that  the  input  power  is  constrained  to 
some  fixed  constant  P.  The  sensitivity  of  channel  capacity 
with  respect  to  the  contaminating  noise  power  is  defined  as 


Sp  =  lim 


Cp(0)-Cp(d) 


II.  Gaussian  Contamination 
If  the  contaminating  process  { Z{ }  is  Gaussian,  then  the 
capacity  of  (l)  admits  the  well-known  water-filling  solution 


where  K0  is  the  nominal  water  level.  It  follows  that  the  sensi¬ 
tivity  is  maximized  by  a  contaminating  random  process  which 
concentrates  its  power  at  those  frequencies  where  the  nominal 
noise  spectral  density  is  minimum.  Note  that  the  worst-case 
sensitivity  is  minimized  over  the  nominal  noise  spectral  den¬ 
sity  by  white  noise,  in  which  case  the  sensitivity  is  equal  to 

2<r2  P  +  a2 

regardless  of  the  power  spectral  density  of  the  contaminating 
process. 

III.  NonGaussian  Contamination 

Since  Gaussian  noise  minimizes  capacity  for  a  given  power 
spectral  density,  the  expression  in  (4)  is  an  upper  bound  to 
sensitivity  for  nonGaussian  contamination.  Despite  the  lack 
of  an  expression  for  C(9)  in  the  nonGaussian  case,  this  paper 
shows  that 

•  The  sensitivity  is  equal  to  (5)  if  the  nominal  Gaussian 
noise  is  white  and  the  contaminating  noise  is  regular  (cf. 
[2])- 

•  The  sensitivity  is  equal  to  (4)  if  both  the  nominal  and 
contaminating  noises  are  regular  and  if  the  ratio  of 
spectral  densities  of  contaminating  to  nominal  noises: 
Z(f)/No(f)  is  bounded  on  [o,  |] . 

•  The  sensitivity  is  equal  to  0  if  the  nominal  noise  is  reg¬ 
ular  and  the  contaminating  noise  is  entropy-singular. 


IV.  Rate-Distortion  Function 
Consider  the  random  process  N  +  9 Z  and  denote  by  Rd(9) 
its  rate-distortion  function  relative  to  the  mean-square-error 
criterion.  We  have  shown  (under  the  same  conditions  as 
above)  that  if  D  <  cr2,  then  the  sensitivity  of  the  rate- 
distortion  function  is 


where  A0  is  defined  by  the  equation 


=  5 


[Ke  -  Nojf)  -  ■?(/)]" 

N0{f)  +  e2Z(f) 


min{A0,  N0(f)}  df  ■ 


where  No(f)  and  Z(f)  are  the  power  spectral  densities  of  the 
nominal  and  contaminating  noises,  respectively,  and  the  water 
level  Ke  is  adjusted  so  that  the  integral  of  the  optimum  input 
power  spectral  density  S9(f)  is  equal  to  P,  where  Se(f)  is  the 
numerator  in  (3). 

We  show  in  this  paper  that  the  sensitivity  of  the  water- 
filling  channel  capacity  formula  admits  the  following  simple 
expression: 
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Abstract  —  A  graphical  calculus  is  presented  for  de¬ 
termining  the  independence  and  conditional  indepen¬ 
dence  of  random  variables  in  a  specified  probabilistic 
setting.  The  calculus  is  developed  first  for  the  case 
of  random  variables  that  form  a  Markov  chain.  The 
calculus  is  then  extended  to  the  “general  causal  case” 
where  the  random  variables  are  obtained  from  a  se¬ 
quence  of  random  experiments  in  which  each  exper¬ 
iment  can  be  carried  out  in  full  when  the  results  of 
specified  previous  experiments  are  made  available  to 
it. 

I.  Introduction 

Because  mutual  information  is  essentially  a  measure  of 
probabilistic  dependence,  information  theory  can  be  used  to 
devise  a  convenient  calculus  for  reasoning  about  probabilistic 
dependence.  For  example,  because  /(X;  Y )  >  0  with  equality 
if  and  only  if  the  random  variables  X  and  Y  are  indepen¬ 
dent,  it  follows  that  the  determination  of  whether  X  and  Y 
are  independent  is  equivalent  to  determining  whether  I(X;Y) 
vanishes.  Moreover,  the  vanishing  of  /(A;  Y)  can  alternatively 
and  conveniently  be  taken  as  the  definition  of  (probabilistic) 
independence.  Similarly,  the  vanishing  of  the  conditional  mu¬ 
tual  information  J(X ;  Y  |  Z)  can  be  taken  as  the  definition  of 
the  independence  of  X  and  Y  when  conditioned  on  knowledge 
of  Z. 

Conditional  independence  will  be  seen  to  play  an  important 
role  in  the  study  of  probabilistic  dependence.  Independence 
and  conditional  independence  are  in  general  unrelated  prop¬ 
erties  of  random  variables  in  the  sense  that  X  and  Y  can  be 
independent  but  dependent  when  conditioned  on  Z  and,  con¬ 
versely,  X  and  Y  can  be  dependent  but  independent  when 
conditioned  on  Z . 

II.  Markov  Chains 

A  Markov  chain  can  alternatively  and  conveniently  be  de¬ 
fined  as  a  sequence  Xi,  X2,...Xn  of  random  variables  such 
that,  for  all  i  strictly  between  1  and  n,  [X\ ,  X2, . . .  Xi_i]  and 
[Xi+i,  X1+2,  •  •  -  Xn]  are  independent  when  conditioned  on  X;. 
An  immediate  consequence  of  the  symmetry  of  mutual  infor¬ 
mation,  i.e.,  of  the  fact  that  /(X;F  |  Z)  =  I{Y\X  \  Z ),  is 
that  the  reversed  sequence  Xn,Xn- 1,  •  ♦  ♦  X\  is  also  a  Markov 
chain,  which  is  a  well-known  fact  but  one  that  is  awkward 
to  prove  from  the  usual  definition  of  a  Markov  chain.  An¬ 
other  immediate  consequence  of  this  alternative  definition  of 
a  Markov  chain  is  that  any  subsequence  of  a  Markov  chain 
Xi,X2 ,  ...An  is  also  a  Markov  chain,  which  again  is  a  well 
known  fact  that  is  awkward  to  prove  from  the  usual  definition. 

The  following  result  is  as  useful  in  formulating  a  calculus 
of  dependence  as  it  is  trivial  to  prove. 

Proposition  1  (Independence  Inheritance) 

If  I  (WX;  Z  |  Y)  =  0,  then  also  I(X\Z  \  Y)  =  0  and  I(X\Z  \ 
WY)  =  0. 

In  other  words,  if  some  (possibly  conditional)  mutual  informa¬ 
tion  is  zero,  then  any  random  variable  not  in  the  conditioning 


can  be  discarded  or  moved  into  the  conditioning  with  the  mu¬ 
tual  information  remaining  zero. 

The  above  proposition  is  the  basis  for  the  following  calcu¬ 
lus  of  independence  for  Markov  chains:  The  random  variables 
Xi,X2,...Xn  in  the  Markov  chain  are  used  to  label  in  the 
natural  order  the  nodes  of  a  simple  (undirected)  linear  graph 
with  n  nodes.  Then  any  (possibly  conditional)  mutual  infor¬ 
mation  involving  only  the  random  variables  Xi,  X2, .  .  •  An  is 
zero  if,  for  every  pair  of  random  variables  with  one  to  the  left 
and  one  to  the  right  of  the  semicolon  in  the  mutual  informa¬ 
tion  expression,  there  is  a  random  variable  in  the  conditioning 
whose  node  in  the  graph  lies  between  the  nodes  for  these  two 
random  variables.  Moreover,  this  is  the  strongest  statement 
that  can  be  made  in  general  about  the  (conditional)  indepen¬ 
dence  of  the  random  variables  in  a  Markov  chain  in  the  sense 
that  there  are  chains  for  which  the  given  mutual  information 
is  non-zero  when  this  condition  is  not  fulfilled.  It  is  thus  nat¬ 
ural  from  the  graphical  viewpoint  to  think  of  conditioning 
as  “blocking”  dependence  between  the  random  variables  in  a 
Markov  chain. 

III.  General  Causal  Systems 

The  graphical  calculus  of  independence  developed  for 
Markov  chains  can  be  extended  to  apply  to  any  random  vari¬ 
ables  that  can  be  described  as  the  results  of  a  sequence  of 
random  experiments  in  which  the  results  of  only  previous  ex¬ 
periments  affect  the  results  of  following  experiments,  i.  e.,  the 
random  variables  in  the  sequence  have  a  well  defined  defined 
causal  dependence.  The  distinction  between  causal  depen¬ 
dence,  which  is  directed,  and  probabilistic  dependence,  which 
is  undirected,  is  crucial  to  the  formulation  of  this  extended 
graphical  calculus.  In  contrast  to  the  Markov  chain  case,  con¬ 
ditioning  can  in  general  create  probabilistic  dependence  be¬ 
tween  random  variables  that  would  be  independent  without 
this  conditioning. 

The  real  utility  of  the  information-theoretical  calculus 
for  analyzing  probabilistic  dependence  becomes  evident  when 
considering  networks  of  information  sources,  channels,  en¬ 
coders  and  decoders.  Precise  definitions  of  all  these  devices  to¬ 
gether  with  the  rules  for  their  valid  interconnection  in  neworks 
are  required  for  the  precise  formulation  of  the  calculus.  Exam¬ 
ples  will  be  given  in  the  presentation  of  this  paper  to  illustrate 
the  utility  of  the  calculus  in  rather  complicated  networks. 
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Abstract  —  This  paper  gives  a  simplified  treatment 
of,  and  new  results  on,  information-theoretic  lower 
bounds  on  an  opponent’s  cheating  probability  in  an 
authentication  system  with  a  given  key  entropy. 

I.  Introduction 

Authentication  theory  is  concerned  with  providing  evidence 
to  the  receiver  of  a  message  that  it  was  sent  by  a  specified  le¬ 
gitimate  sender,  even  in  the  presence  of  an  opponent  with  un¬ 
limited  computing  power  who  can  intercept  and  modify  mes¬ 
sages  sent  by  the  legitimate  sender  or  send  fraudulent  mes¬ 
sages  to  the  receiver.  Authenticity  (like  confidentiality)  can 
be  achieved  by  cryptographic  coding  when  sender  and  receiver 
share  a  secret  key. 

Compared  to  Shannon’s  theory  of  secrecy,  authentication 
theory  is  more  subtle  and  involved.  After  some  purely  com¬ 
binatorial  results  on  authentication  theory  had  been  derived 
[1],  Simmons  [4]  initiated  a  sequence  of  research  activities  on 
information-theoretic  lower  bounds  in  authentication  theory 
(e.g.,  see  [2],  [3],  [5],  [6]). 

II.  Description  of  the  authentication  model 

Consider  a  scenario  in  which  a  sender  and  a  receiver  share 
a  secret  key  Z.  The  sender  wants  to  send  a  sequence  of  mes¬ 
sages  Xij  X2, . . . ,  Xn ,  at  some  independent  time  instances,  in 
an  authenticated  manner  to  the  receiver.  Each  message  Xi  is 
authenticated  separately  by  sending  an  encoded  message  Yi 
which  depends  (possibly  probabilistically)  on  Z ,  Xi ,  and  pos¬ 
sibly  also  on  the  previous  messages  Xi, . . . ,  Xi-i.  Based  on 
Yi,  Yi, . , . ,  Vf _  1  and  Z  the  receiver  decides  to  either  reject  the 
message  or  accept  it  as  authentic  and,  in  case  of  acceptance, 
decodes  Yi  to  a  message  Xi. 

An  opponent  can  use  either  of  two  different  strategies  for 
cheating.  In  an  impersonation  attack  at  time  i,  the  oppo¬ 
nent  waits  until  he  has  seen  the  encoded  messages  Yi, . . . ,  Yj-i 
(which  he  lets^  pass  to  the  receiver)  and  then  sends  a  fraudu¬ 
lent  message  Yi  which  he  hopes  to  be  accepted  by  the  receiver 
as  the  z'th  message.  In  a  substitution  attack  at  time  i,  the 
opponent  lets  pass  messages  Yi,...,Yj_i,  intercepts  Yi  and 
replaces  it  by  a  different  message  Y*  which  he  hopes  to  be  ac¬ 
cepted  by  the  receiver  and  decoded  to  a  message  different  from 
the  one  sent  by  the  sender.  There  are  three  possible  goals  an 
opponent  might  persue  in  either  of  these  two  attacks: 

•  The  receiver  accepts  Yi  as  a  valid  message. 

•  The  receiver  accepts  Yi  and  decodes  it  to  a  message  Xi 
known  to  the  opponent.  In  other  words,  an  opponent  is 
only  considered  successful  if  he  also  guesses  the  receiv¬ 
er’s  decoded  message  Xi  correctly. 

•  The  receiver  accepts  Yi  and  decodes  it  to  a  particular 
message  Xi  —  x  chosen  by  the  opponent.  Hence  this 
type  of  attack  depends  on  a  particular  value  x. 

*This  work  was  supported  by  the  Swiss  National  Science 
Foundation. 


We  will  denote  the  maximal  possible  probabilities  of  success, 
for  the  three  described  scenarios,  by  P/(z),  Pi(i)  and  Pi(i,x ), 
respectively,  for  an  impersonation  attack  at  time  i,  and  by 
Ps(i),  Ps{i)  and  Ps(i,x),  respectively,  for  a  substitution  at¬ 
tack  at  time  i. 

III.  Information-theoretic  bounds 

The  literature  on  information-theoretic  bounds  in  authenti¬ 
cation  theory  is  quite  diverse  because  various  different  models 
are  considered.  Generally,  the  proofs  are  quite  complicated 
and  valid  only  for  a  restricted  model  while  the  results  could 
actually  be  proven  for  a  general  model.  For  instance,  some 
proofs  only  hold  for  deterministic  encoding,  for  single  (rather 
than  a  sequence  of)  messages,  for  a  sequence  of  messages  but 
with  the  restrictions  that  the  encoding  rule  be  the  same  for 
each  message  and  that  consecutive  messages  be  distinct,  or 
that  the  encoding  rules  do  not  depend  on  previous  messages. 

The  goal  of  this  paper  is  to  derive  various  bounds  in  a 
coherent,  more  general  setting,  but  by  a  simpler  proof  tech¬ 
nique  than  those  used  before.  In  particular,  we  consider  all 
three  scenarios  described  above  and  our  results  could  be  gen¬ 
eralized  to  a  scenario  where,  for  the  sake  of  a  smaller  cheating 
probability,  also  a  specified  maximal  probability  of  a  decoding 
error  for  a  correct  message  can  be  tolerated. 

Some  of  the  derived  bounds  are  stated  below.  The  first  two 
bounds  were  also  derived  in  [5]  in  a  slightly  less  general  form. 


Plii) 

> 

2-/(Yi;Z|Y1...Yi_1) 

Ps(i) 

> 

2-H(Z\Y1...Yi) 

P>{i) 

> 

2~HYi;Z\Y1...Yi^1Xi) 

Ps(i) 

> 

2 -H(Z\Y1...YtXi) 

Pi(i,x) 

> 

2~nYi-,Z\Y1...Yi„x,Xi= : 

Ps(i ,  x) 

> 

2-H(Z\Y1...Yi,Xi=x) 
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Abstract  —  Finding  the  input  distribution  that  max¬ 
imizes  mutual  information  leads,  not  only  to  the  ca¬ 
pacity  of  the  channel,  but  to  engineering  insights  that 
tell  the  designer  what  good  codes  should  be  like.  This 
is  due  to  the  folk  theorem:  The  empirical  distribution  of 
any  good  code  (i.e.,  approaching  capacity  with  vanishing  prob¬ 
ability  of  error)  maximizes  mutual  information.  This  paper 
formalizes  and  proves  this  statement. 

I.  Introduction 

The  unique  7i-dimensional  distribution  that  maximizes  the  n- 
block  input-output  mutual  information  of  a  binary  symmetric 
channel  (BSC)  puts  equal  mass  on  all  2n  binary  n-strings. 
Thus,  common  wisdom  in  information  theory  indicates  that 
in  order  to  approach  the  capacity  of  a  BSC,  a  code  must  be 
such  that  the  ensemble  of  its  equiprobable  codewords  appears 
to  be  generated  by  a  source  of  independent  equ ally-likely  bits. 
Formalizing  and  proving  such  a  statement  is  not  trivial  as  evi¬ 
denced  by  the  fact  that  the  entropy  rate  of  a  source  of  pure  bits 
is  equal  to  1  bit,  whereas  the  entropy  rate  of  the  channel  input 
induced  by  2nR  equiprobable  codewords  is  equal  to  R ,  and  if 
the  probability  of  error  is  to  vanish,  then  R  <  1  —  h{jp)  <  1. 
Thus,  convergence  of  the  n-dimensional  input  distributions  to 
a  Bernoulli-1/2  source  is  ruled  out.  A  good  deal  of  the  intu¬ 
ition  on  which  the  above  common  wisdom  is  grounded  arises 
from  the  consideration  of  the  input  distributions  of  random 
coding ,  where  not  only  do  we  average  over  equiprobable  code¬ 
words,  but  over  codebooks  generated  randomly  according  to 
the  distribution  maximizing  mutual  information.  Then,  the 
averaged  input  distributions  of  a  random  code  are  trivially 
equal  to  the  capacity  achieving  input  distributions.  However, 
this  trivial  conclusion  predicts  nothing  about  the  behavior  of 
the  input  distributions  of  any  particular  code,  which  is  the 
problem  of  interest. 

It  has  been  shown  in  [1]  that  for  any  finite-input  channel 
that  satisfies  the  strong  converse,  the  output  distribution  in¬ 
duced  by  any  good  code  sequence  converges  (in  normalized 
divergence)  to  the  (unique)  output  distribution  induced  by  a 
capacity  achieving  input  distribution.  In  certain  cases  (such  as 
discrete  memoryless  channels  with  full-rank  transition  matri¬ 
ces  [2]),  such  a  result  implies  convergence  of  the  input  statis¬ 
tics.  However,  in  general,  such  convergence  does  not  follow 
directly  from  the  convergence  of  output  statistics. 

II.  Definitions 

A.  Empirical  Distributions.  For  every  codeword  of  a  channel 
code  we  can  find  its  first-order  empirical  distribution  by  com¬ 
puting  the  fraction  of  symbols  in  the  codeword  equal  to  each 
input  letter.  If  for  a  given  codebook  we  average  the  empirical 
distributions  over  equiprobable  codewords  we  obtain  the  first- 
order  empirical  distribution  of  the  code.  Analogously,  /c-th 
order  empirical  distributions  can  be  defined  by  computing  for 
each  /c-string  v  the  fraction  of  /c-strings  within  the  codeword 
equal  to  v.  Averaging  over  equiprobable  codewords  results  in 


the  /c-th  order  empirical  distribution  of  the  code.  Thus,  for  a 
code  composed  of  M  codewords  of  blocklength  n, 

{zim,  i  ==  1  -  •  -n,  m  —  1, . . .  M},  the  «th-order  empirical  dis¬ 
tribution,  p£(K),  is  defined  as: 


Tt—K+l 


where 


1  m 

P j  ■  •  ■  j  ffl*)  =::  ^  ^  l{Zim  =  O-l}  ’  ’  ’  l{-Zt  +  K-l,m  =  an 


B.  Good  Codes  are  channel  codes  whose  rate  is  close  to  the 
channel  capacity  and  whose  decoding  error  probability  van¬ 
ishes  with  blocklength.  More  precisely,  a  good  code-sequence 
for  a  channel  with  capacity  C  is  a  sequence  of  ( M,  An)  codes 
such  that: 

An— >0, 


lim  inf  =  C  . 

Tt  — ►  OO  71 


III.  Discrete  Memoryless  Channels 

We  have  obtained  results  for  a  variety  of  channels,  includ¬ 
ing  channels  with  memory  and  continuous- alphabet  channels. 
Our  main  result  for  discrete  memoryless  channels  (DMC)  is 
Theorem  1  Consider  any  good  code  sequence  which  does  not 
use  any  symbol  having  zero  mass  under  every  input  distri¬ 
bution  that  maximizes  the  single-letter  mutual  information. 
Then,  the  K-order  empirical  distribution  of  such  a  code  se¬ 
quence  satisfies: 


lim 

TT.  — ►  OO 


min 

px  s.t.  l(X,Y)=c 


X 


where  C  is  the  channel  capacity . 


Note  that  the  existence  of  a  good  code  sequence  satisfy¬ 
ing  the  approximation  property  in  Theorem  1  for  any  fixed  k 
is  predicted  by  the  optimality  of  constant-composition  codes. 
But,  in  fact,  this  result  holds  for  any  good  code  sequence  be¬ 
cause  of  Theorem  1.  A  refinement  of  Theorem  1  entails  letting 
k  grow  with  n.  We  have  shown  that  any  growth  faster  than 
log  n  destroys  convergence. 
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Abstract  —  In  this  work,  for  the  memoryless  source 
with  unequal  probabilities  of  symbols  generation  we 
derive  the  limiting  distribution  for  number  of  phrases 
in  the  Lempel-Ziv  parsing  scheme.  This  proves  a  long 
standing  open  problem.  In  order  to  establish  it  we 
had  to  solve  another  open  problem,  namely,  that  of 
deriving  the  limiting  distribution  of  the  internal  path 
length  in  a  digital  search  tree. 


I.  Introduction  and  Main  Results 

The  primary  motivation  for  this  work  is  the  desire  to  un¬ 
derstand  the  asymptotic  behavior  of  the  fundamental  parsing 
algorithm  on  words  due  to  Lempel  and  Ziv  [5].  It  partitions 
a  word  into  phrases  (blocks)  of  variable  sizes  such  that  a  new 
block  is  the  shortest  subword  not  seen  in  the  past  as  a  phrase. 
For  example,  the  string  110010100010001000  is  parsed  into 
(1)(10)(0)(101)(00)(01)(000)(100). 

We  study  the  distribution  of  the  number  of  phrases  Mn 
constructed  from  a  word  of  a  fixed  length  n  in  a  probabilistic 
framework.  We  assume  that  the  word  is  generated  by  a  prob¬ 
abilistic  memoryless  binary  source.  That  is:  symbols  are  gen¬ 
erated  in  an  independent  manner  with  ”0“  and  ” 1  “  occurring 
respectively  with  probability  p  and  q  =  1  —  p.  If  p  ~  q  =  0.5, 
then  we  call  it  the  symmetric  Bernoulli  model;  otherwise  we 
refer  to  the  asymmetric  Bernoulli  model. 

In  order  to  study  Mn,  we  reduce  it  to  another  problem  on 
digital  trees  that  is  easier  to  handle.  The  reader  is  referred  to 
[3]  for  a  discussion  and  definition  of  digital  trees.  In  short:  the 
root  of  the  tree  is  empty.  All  other  phrases  of  the  Lempel-Ziv 
parsing  algorithm  are  stored  in  nodes.  When  a  new  phrase  is 
created,  the  search  starts  at  the  root  and  proceeds  down  the 
tree,  that  is,  symbol  ”0“  in  the  input  string  means  a  move 
to  the  left  and  ”1“  means  a  move  to  the  right.  The  search  is 
complete  when  a  branch  is  taken  from  an  existing  tree  node 
to  a  new  node  that  has  not  been  visited  before. 

Observe  that  for  fixed  n  the  number  of  nodes  in  the  associ¬ 
ated  digital  tree  is  random  and  equal  to  Mn .  We  also  consider 
a  digital  tree  in  which  the  number  of  nodes  is  fixed  and  equal 
to  m,  and  we  call  such  a  model  the  digital  tree  model.  For 
fixed  m,  we  denote  by  Dm(i)  the  length  of  the  path  from  the 
root  to  the  2  th  node  (the  zth  depth).  Then,  the  internal  path 
length  Lm  becomes  Lm  = 

In  view  of  the  above  definitions,  we  note  that  Mn  satis¬ 
fies  the  following  renewal  equation  Mn  =  max{m  :  Lm  = 
Xir=i  &™{i)  <  n}  ,  which  directly  implies  that  Pi{Mn  > 
m}  =  Pr{L 

m  <  tc}.  Thus  one  can  analyze  Mn  through  Lm 
due  to  the  following  result  of  Billingsley  [2]:  If 


mi) 


a) 
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then 


Mn  -  n/(p,n/n) 


mi) 


(2) 


Pm 


<Tn(Pn/n)-3/2 

where  N(0, 1)  is  the  standard  normal  distribution,  and 
and  o' m  are  positive  constants. 

Let  Lm(u)  =  Eul »  and  L(z,u)  =  £m=o  Lm{u)zm/ml  be 
generating  functions  of  Lm  and  Lm(u),  respectively.  We  can 
show  that  L{u,z)  satisfies  the  following  differential-functional 
equation  for  a  memoryless  source 


dL(z,u) 

— — - =  L{pzu,  u)L(qzu ,  u)  (3) 

with  L(z ,  0)  =  1. 

Using  the  above  differential-functional  equation  and  (2),  we 
prove  the  following  theorem  that  directly  extends  the  Aldous 
and  Shields  [1]  results  who  established  the  limiting  distribu¬ 
tion  of  Mn  only  for  the  symmetric  Bernoulli  model. 

Theorem  .  (i)  For  a  memoryless  source  the  following  weak 
convergence  result  holds 


Mn  -  EMn 
\fvarMn 


mi) 


(4) 


with  EMn  ~  andvarMn  ~  where  c2  =  (H~h2)/h3 
with  h  =  ~p  logp  — glog  q  being  the  entropy  of  the  alphabet  and 
H  =  p  log2  p-\-q  log2  q.  Moreover ,  moments  of  Mn  converge  to 
the  appropriate  moments  of  the  normal  distribution .  Finally , 

Pr{ | Mn  —  EMn |  >  eEMn }  <  A  exp  (— aey/n}  (5) 
for  some  constants  A  >  0  and  e  >  0. 

Theorem  above  has  plenty  of  applications  in  data  compres¬ 
sion  (e.g.,  rate  of  convergence,  etc).  For  example,  using  it 
we  established  in  [4]  the  limiting  distribution  of  the  phrase 
length.  Furthermore,  using  the  large  deviation  result  (5),  we 
can  obtain  information  about  Lemepl-Ziv  code  redundancy, 
Rn-  That  is,  Pr {Rn  >  e}  =  Pr{Mn(log  Mn  + 1)  >  n(A  +  e)}  < 
A  exp  (— ae^fn)  for  e  <  h. 
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Abstract  —  A  class  of  multiple  dictionary  Lempel- 
Ziv  algorithms  is  described,  where  a  set  of  context  de¬ 
pendent  dictionaries  are  maintained,  and  a  dictionary 
chosen  based  on  empirical  performance  data.  These 
algorithms  are  conceptually  simpler  than  an  earlier 
approach  based  on  dynamic  programming [1]  and  are 
also  asymptotically  optimal. 

It  is  well  known [3]  that  the  context  of  a  symbol  (the 
preceding  few  symbols)  can  be  used  to  improve  compres¬ 
sion  or  prediction  of  the  symbol.  For  example,  the  con¬ 
text  algorithm[3,  4]  chooses  the  estimated  best  context 
for  compression  via  arithmetic  coding.  However,  the  most 
popular  techniques  are  based  on  Lempel-Ziv  coding.  In 
LZ78,  a  tree  structured  dictionary  is  constructed  using 
the  source  sequence,  and  then  used  for  compression.  Plot- 
nik,  Weinberger  and  Ziv[2]  consider  a  source  generated  by 
a  finite  state  Markov  chain  and  show  that  maintaining 
separate  dictionaries  for  each  state  of  the  source  machine 
improves  the  rate  of  convergence  of  the  algorithm. 

In  [1],  a  class  of  context  dependent  extensions  to  the 
Lempel-Ziv  algorithm  were  described,  in  which  multiple 
dictionaries  were  maintained,  of  which  a  subset  (called 
the  basis  set,  corresponding  to  a  complete  suffix  tree  of 
contexts)  was  chosen  via  dynamic  programming  to  opti¬ 
mize  an  estimate  of  the  compression  achievable  over  the 
next  phrase.  This  family  of  algorithms  was  shown  to  be 
asymptotically  optimal,  and  showed  promise  of  improved 
compression. 

We  here  develop  an  alternative  approach  where  the  set 
of  contexts  selected  at  a  given  time  need  not,  as  in  [1],  cor¬ 
respond  to  a  complete  suffix  tree.  The  method  utilizes  a 
more  extensive  set  of  performance  estimates,  which  how¬ 
ever  is  available  via  direct  empirical  observations  for  the 
proposed  dictionary  construction  algorithms. 

Associated  with  every  context  of  length  <  D,  we  main¬ 
tain  a  dictionary  consisting  of  phrases  seen  in  that  context 
and  the  empirical  performance  of  such  dictionaries.  For 
example,  if  D  =  3,  then,  corresponding  to  the  maximal 
depth  context  010,  we  maintain  a  record  of  the  perfor¬ 
mance  of  the  dictionaries  corresponding  to  context  0  (the 
null  context),  0,  10,  and  010.  These  D  +  l  numbers  are 
updated  each  time  the  context  is  seen  at  the  end  of  a 
phrase  (not  just  when  the  context  is  actually  used  for 
compression).  Compression  of  the  next  phrase  is  then 
via  the  dictionary  corresponding  to  a  current  best  empir¬ 
ical  context.  The  decoder  maintains  the  same  estimates, 
and  therefore  knows  the  dictionary  used. 

Dictionary  maintenance  algorithms  that  we  consider 
are  closely  related  to  those  of  [1].  For  two  of  those  al¬ 
gorithms,  empirical  performance  measures  are  directly 


available  as  a  consequence  of  the  construction  process. 
The  two  algorithms  are  (following  the  names  in  [1]): 

•  Algorithm  2  -  Multiple  dictionaries :  Separate 

Lempel-Ziv  trees  are  maintained  for  each  possible 
context  of  depth  upto  D.  Phrases  are  added  to 
the  corresponding  dictionary  every  time  a  context  is 
seen  by  means  of  constructs  termed  tokens  added  to 
the  root  of  the  LZ  tree  every  time  a  context  is  seen 
at  the  end  of  a  phrase,  and  then  advanced  through 
the  tree  using  the  subsequent  symbols,  ultimately 
being  promoted  to  form  a  new  node.  When  this 
occurs,  the  performance  measures  are  updated. 

•  Algorithm  3}-Compound  dictionary :  In  [1],  it  was 
suggested  that  it  would  be  more  efficient  to  view  the 
multiple  dictionaries  as  subtrees  of  a  single  larger 
dictionary,  reached  from  the  root  via  the  appropri¬ 
ate  context.  This  makes  more  efficient  use  of  stor¬ 
age,  and  here  too,  tokens  are  used  in  the  updating 
procedures. 

Algorithms  2  and  2’  have  substantial  overhead,  which 
may  be  regarded  as  “wasted”  since  many  of  the  dictio¬ 
naries  may  not  be  used  for  compression.  However,  in 
Algorithm  3’,  dictionaries  not  used  for  compression  also 
contribute  to  the  growth  of  useful  dictionaries,  yielding 
better  performance.  Both  the  algorithms  above  are  as¬ 
ymptotically  optimal,  as  shown  by  the  results  of  [1].  Ex¬ 
perimental  results  on  binary  versions  of  ASCII  files  show 
that  these  new  methods  do  better  than  standard  Lempel- 
Ziv,  and  perform  close  to  that  of  context  allocation  with 
dynamic  programming. 
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Abstract  —  The  minimum  universal  coding  redun¬ 
dancy  for  finite-state  arbitrarily  varying  sources,  is 
investigated.  If  the  space  of  all  possible  underlying 
state  sequences  is  partitioned  into  types,  then  this 
minimum  can  be  essentially  lower  bounded  by  the 
sum  of  two  terms.  The  first  is  the  minimum  redun¬ 
dancy  within  the  type  class  and  the  second  is  the  min¬ 
imum  redundancy  associated  with  a  class  of  sources 
that  can  be  thought  of  as  “representatives”  of  the  dif¬ 
ferent  types.  While  the  first  term  is  attributed  to  the 
cost  of  uncertainty  within  the  type,  the  second  term 
corresponds  to  the  type  itself.  The  bound  is  achiev¬ 
able  by  a  Shannon  code  w.r.t  an  appropriate  two-stage 
mixture  of  all  arbitrarily  varying  sources  in  the  class. 

We  investigate  the  minimum  attainable  redundancy  in  uni¬ 
versal  coding  for  arbitrarily  varying  sources  (AVS’s).  An  AVS 
is  a  nonstationary  memoryless  source  characterized  by  the 
probability  mass  function  (PMF), 

n 

p(xis)=n^M,  (i) 

i=i 

where  x  =  (ii,...,xn)  is  an  observed  data  sequence  to  be 
encoded,  Xi  taking  on  values  in  a  finite  set  T,  and  s  = 
(si,...,5n)  is  an  unknown  arbitrary  sequence  of  states  cor¬ 
responding  to  x,  where  each  s;  takes  on  values  in  a  set  5.  We 
shall  assume,  for  the  sake  of  simplicity,  that  the  parameters 
of  the  AVS  {p(z|s)}are;t')se.s  are  known. 

The  problem  of  universal  coding  for  AVS’s  has  relatively  re¬ 
ceived  only  little  attention.  Berger  [1,  Sect.  6.1.2]  and  Csiszar 
and  Korner  [2,  Theorem  4.3]  have  characterized  the  best  at¬ 
tainable  rate-distortion  tradeoff  for  block-to-block  (BB)  codes 
where  the  average  distortion  is  required  to  be  within  a  pre¬ 
scribed  level  D  for  the  worst  possible  state  sequence.  For  the 
distortionless  case  (D  =  0)  the  best  attainable  rate  in  this 
sense  is  given  by  the  entropy  of  the  worst  memoryless  source 
in  the  convex  closure  of  {p(-|s),s  £  S},  that  is  the  maximum 
entropy  attained  among  all  mixtures  m(x)  =  fs  w(ds)p(x\s), 
w  being  a  probability  measure  on  <S.  The  reason  for  this  worst 
case  result  is  that  both  the  rate  is  held  fixed  at  each  block  and 
the  distortion  constraint  must  be  met  for  every  possible  state 
sequence. 

We  show  that  one  can  improve  upon  this  pessimistic  result 
if  variable-rate  codes  are  allowed  because  then  there  is  a  po¬ 
tential  freedom  to  “adapt”  the  rate  to  the  underlying  state 
sequence  in  some  sense.  Specifically,  we  show  that  for  finite- 
state  AVS’s  there  exists  lossless  a  block-to- variable  (BV)  code 
whose  compression  ratio  is  essentially  the  entropy  of  the  mem¬ 
oryless  source  ms(x)  =  tus(s)jo(a;|s),  where  u>s(s)  is  the 

empirical  probability  (i.e.,  relative  frequency)  of  s  £  S  along 
the  underlying  state  sequence  s.  This  entropy  is  of  course 
never  larger  than  the  maximum  entropy  mentioned  above.  It 
is  therefore  easy  to  see  that  the  redundancy,  namely,  the  ex¬ 
cess  rate  beyond  the  per-letter  entropy  of  the  AVS  given  s,  is 


essentially  equal  to  the  mutual  information  IW&(S;  X)  associ¬ 
ated  with  the  joint  PMF  ws(s)p(x\s).  This  quantity  in  turn 
agrees  with  that  of  [l]  and  [2]  only  if  s  maximizes  the  entropy. 

Furthermore,  IWs(S;X)  is  essentially  a  lower  bound  on  the 
redundancy  in  a  fairly  strong  sense.  If  we  consider  the  set  of  all 
state  sequences  of  a  certain  type  class  (i.e.,  the  same  empirical 
PMF  ws)  and  hence  yield  the  same  ms,  then  by  a  direct 
application  of  [3,  Theorem  1],  for  any  uniquely  decipherable 
code  that  is  independent  of  s,  the  redundancy  is  essentially 
never  less  than  IW&(S]  A”)  for  most  state  sequences  in  this  type 
class. 

This  bound  is  valid  even  if  the  type  class  in  known  a-priori. 
But  if  the  type  class  is  not  known  in  advance  intuition  suggests 
that  there  must  be  an  additional  cost.  We  next  demonstrate  a 
coding  scheme  that  is  optimal  in  the  sense  of  yielding  the  min¬ 
imum  attainable  extra  term,  which  in  turn  can  be  thought  of 
as  the  redundancy  associated  with  universal  coding  for  a  class 
of  auxiliary  sources  that  are  “representing”  the  different  type 
classes  in  a  certain  sense.  Specifically,  The  proposed  coding 
scheme  can  be  interpreted  as  an  hierarchical,  two-step  univer¬ 
sal  code,  where  the  first  step  is  to  construct  the  best  universal 
code  within  each  type,  and  the  second  is  to  optimally  inte¬ 
grate  these  codes  by  constructing  another  universal  code  for 
the  class  of  the  above  mentioned  auxiliary  sources.  The  opti¬ 
mality  of  the  proposed  hierarchical  code  is  in  the  sense  that 
for  any  other  code,  most  type  classes  have  the  property  that 
except  for  a  small  minority  of  state  sequences  in  the  type  class, 
the  redundancy  is  essentially  never  less  than  the  redundancy 
of  the  proposed  code. 

Finally,  we  point  out  that  a  natural  subdivision  of  a  class 
A  of  sources  into  subclasses  Ai,A2,...,  takes  place  in  other 
situations  as  well.  Another  example  is  the  class  of  all  Markov 
sources,  where  A;  is  the  class  of  zth  order  Markov  sources.  The 
hierarchical  universal  coding  approach  demonstrated  here,  ex¬ 
tends  in  the  general  case  to  a  Shannon  code  w.r.t  the  double 
mixture,  first  over  each  A;  and  then  over  {i}.  Such  a  code 
was  called  “twice  universal”  in  [4].  Similarly  to  Theorem  2, 
it  can  be  shown  that  any  other  code  cannot  outperform  the 
twice  universal  code,  for  “most”  points  in  every  A*,  except  for 
a  minority  of  classes  At.  Here  by  “most”  we  mean  with  high 
probabilty  as  measured  by  the  mixture  weights. 
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I.  Introduction 

A  two-stage-procedure  using  the  sufficient  statistics  of  the 
parameters  of  the  source  models  is  proposed  in  this  paper.  In 
the  procedure,  the  sufficient  statistics  calculated  from  a  source 
sequence  Ls  transmitted  at  first  stage.  At  second  stage,  the 
source  sequence  is  encoded  by  using  the  conditional  distribu¬ 
tion  given  the  sufficient  statistics.  Although  the  quantization 
is  need  to  transmit  the  estimator  vector  in  the  previous  two 
stage  codes  [1][2],  since  the  sufficient  statistics  is  discrete  ran¬ 
dom  variable,  the  quantization  is  not  need  to  transmit  them. 
Moreover,  the  redundancy  of  the  proposed  code  is  equal  to 
that  of  Bayes  code[3]|4]  . 

II.  The  previous  two-stage  codes 

We  assumed  that  a  class  of  parameterized  distributions  of 
an  information  source  is  known  but  the  parameters  of  the  dis¬ 
tribution  function  are  unknown  throughout  this  paper.  Let 
x%  6  A  be  a  source  symbol  in  a  finite  alphabet  A.  A  source  se¬ 
quence  is  denoted  by  xn  :  X1X2  *  ■  •  xn.  A  parameterized  distri¬ 
bution  is  denoted  by  P(x7L\0)  where  0  G  ©  is  a  real  parameter 
vector  in  the  parameter  space  ©  of  the  distribution. 

In  the  coding  procedures  using  MDL[1]  or  MML[2]  crite¬ 
rion,  at  the  first  stage,  the  estimator  0{xn)  of  the  parameter  0 
estimated  from  a  source  sequence  xn  is  encoded.  At  the  second 
stage,  the  source  sequence  is  encoded  by  using  the  estimator 
0(x71).  The  code  word  length  of  these  procedures  is 

represented  by 

LM(xn)  -  L(0{xn))  -  \og P(xn\0  -  6(xn))-  (1 ) 

The  first  term  of  the  right  hand  side  L(0(xn))  represents  the 
description  length  of  the  estimator  vector  0(xn)  itself.  The 
second  term  is  the  ideal  codeword  length  of  a  source  sequence 
x71  encoded  by  P(xn\0  —  0(xn)):  the  distribution  whose  pa¬ 
rameter  is  substituted  by  the  estimator  0(xn). 

However,  since  the  parameter  vector  is  real,  the  quantiza¬ 
tion  of  the  estimator  vector  0(xn)  is  need  to  transmit  it.  MDL 
criterion  was  induced  by  considering  the  quantized  scale  of 
the  estimator  0(x71)  to  minimize  the  total  description  length 
Lm  (xn) .  MML  criterion  was  also  derived  by  studying  encod¬ 
ing  method  of  the  estimator  using  the  prior  distribution  P{0). 

III.  Sufficient  statistics  code 

The  fundamental  difference  between  sufficient  statistics 
codes  and  the  previous  two-stage  codes  is  to  transmit  the  suf¬ 
ficient  statistics  u(x”)  instead  of  the  estimator  0(xTt). 

The  sufficient  statistics  u(xn)  satisfies  the  following  equal¬ 
ity. 

H(0\u(xn))  -  H(0\xn).  (2) 

The  above  equality  indicates  that  the  sufficient  statistics  u(xn) 
includes  all  information  with  respect  to  0  in  a  source  sequence 
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xn.  Thus,  there  is  no  information  loss  with  respect  to  0  by 
transmitting  u(xn )  instead  of  the  estimator  0(xn ). 

In  sufficient  statistics  codes,  at  the  first  stage,  the  suffi¬ 
cient  statistics  u(xn)  is  encoded  and  transmitted.  At  the 
second  stage,  the  source  sequence  xn  is  encoded  by  using 
P(xn\u(xn)):  the  conditional  probability  of  xn  given  u(xn). 

The  encoding  probability  P(xn\u(xn))  at  the  second  stage 
essentially  differs  from  P(xn|0  “  0(xn))  used  in  the  pre¬ 
vious  two-stage  procedures.  P(xn\0  —  0(xn))  is  given  by 
substituting  0{xn)  for  0  in  the  source  distribution  P(xn\0). 
P(xn\0  =  0(xn))  is  different  from  the  conditional  probability 
P(xn\0(xn))  under  the  condition  that  0(xn)  is  estimated  from 
a  source  sequence  x71.  In  sufficient  statistics  code,  the  condi¬ 
tional  probability  P(xn\u(xn))  under  the  condition  that  the 
sufficient  statistics  ^(x71)  is  calculated  from  x71,  is  used  for  the 
encoding  probability. 

The  ideal  codeword  length  Ls(xn)  of  sufficient  statistics 
codes  is  given  as  follows: 

Ls(xn)  =  L(u(xn))  -  log P(x>(xw)).  (3) 

The  first  term  of  the  right  hand  side  of  the  above  expression 
L(u(xn))  represents  the  description  length  of  the  sufficient 
statistics  u(xn)  in  first  stage  of  the  procedure.  Although  the 
quantization  is  need  to  transmit  the  estimator  vector  9(xn) 
in  the  previous  two-stage-code,  since  the  sufficient  statistics 
u(xn)  is  discrete  random  variable,  the  quantization  is  not  need 
to  transmit  u(xn ). 

The  second  term  is  the  ideal  codeword  length  of  the  source 
sequence  xn  in  the  second  stage.  The  term  is  uniquely  deter¬ 
mined  by  the  conditional  probability  P(xn\u(x71)).  Then,  the 
total  code  word  length  of  sufficient  statistics  codes  is  depend 
on  the  coding  method  of  u(xn). 

Theorem  1  The  ideal  code  word  length  of  sufficient  statis¬ 
tics  code  Ls(xn)  is  identical  with  that  of  Bayes  code ,  if  the 
description  length.  L(u(xn))  of  u(xn)  as  follows: 

L(u(xn))  =  —  log  J  P(u(xn)\0)P(0)d0.  (4) 

Various  type  of  sufficient  statistics  codes  can  be  constructed 
by  changing  the  prior  distribution  P(0)  as  Bayes  codes.  Es¬ 
pecially,  the  minimax  redundancy  codes  are  constructed  by 
using  the  least  favorable  prior  for  the  redundancy  risk. 
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Abstract  —  Two  modifications  of  the  Lempel-Ziv- 
Welch  (LZW)  algorithm  are  presented  to  limit  the 
dictionary  size.  First,  a  run-length  encoding  (RLE)  is 
combined  with  the  LZW  algorithm,  in  order  to  pre¬ 
select  the  input  data.  Then,  a  dynamic  update  of 
the  dictionary  is  performed  by  eliminating  the  free 
branches  in  the  tree  representing  the  dictionary. 

I.  Introduction 

The  LZW  technique  is  included  in  the  V42  bis  recommenda¬ 
tion  of  the  CCITT  and  it  is  widely  used  in  communications. 
Basically,  it  is  a  lossless  and  a  non  statistic  compression  al¬ 
gorithm  which  maps  variable  length  strings  to  fixed  length 
indexes  (codewords).  It  has  the  advantage  of  being  adaptive 
and  does  not  assume  any  advance  knowledge  of  the  source 
properties.  It  uses  a  dictionary  which  is  built  by  performing 
a  string  matching  after  each  source  symbol  occurrence.  String 
of  different  lengths  are  represented  by  indexes  in  the  diction¬ 
ary,  which  is  the  same  at  the  encoder  and  decoder.  During  the 
compression  phase,  the  dictionary  is  built  on  the  basis  of  the 
input  symbols  and  the  coder  becomes  more  efficient  with  the 
growth  of  the  table  [l],  [3].  However,  once  the  dictionary  is 
full,  no  adaptivity  is  provided  any  more.  In  order  to  be  able 
to  add  a  very  long  strings  to  the  table,  the  algorithm  needs  a 
very  large  dictionary,  the  code  words  become  very  large  and 
the  compression  ratio  decreases.  To  counter  this  problem  a 
compromise  is  necessary.  In  fact,  the  compression  is  optim¬ 
ised  when  the  dictionary  is  a  real  mirror  of  the  input  statistics. 
With  text  or  source  files,  long  repetitive  strings  provide  less  in¬ 
formation,  and,  thus,  the  corresponding  space  in  the  dictionary 
is  not  efficiently  used.  Therefore,  the  algorithm  must  continu¬ 
ously  update  the  dictionary,  without  increasing  its  size.  That 
can  be  achieved  by  the  two  following  schemes: 

-Combining  the  LZW  algorithm  with  a  rim  length  encod¬ 
ing  to  avoid  overloading  the  dictionary  with  long  repetitive 
sequences  (pre-selection). 

-Eliminating  less  probable  strings  from  the  dictionary,  in 
order  to  keep  a  sufficient  level  of  adaptivity  for  the  algorithm. 

II.  COMBINING  LZW  AND  RUN-LENGTH  ENCODING 
The  run-length  encoding  eliminates  the  repeated  symbols  from 
the  input  data.  The  number  N  of  repetitions  must  be  greater 
than  a  pre-defined  threshold.  It  exchanges  all  the  repetitions 
in  the  stream  of  data  with  a  special  sequence.  The  LZW  al¬ 
gorithm  can  be  introduced  in  the  cascade  as  follows.  The  cre¬ 
ation  of  a  new  string  (Xi+y)  in  the  dictionary  is  made  by  con¬ 
catenating  a  unique  character  (y)  with  a  string  (Xi)  present 
in  the  dictionary.  The  run-length  encoding  technique  scans 
the  input  strings;  if  the  input  is  a  repetition  of  N  symbols  (N 
greater  than  a  pre-defined  threshold),  without  using  the  dic¬ 
tionary,  the  algorithm  outputs  a  run-length  encoding,  to  code 
the  repetitions.  Then,  the  LZW- RLE  coder  continues  normally 
the  coding  process  with  the  LZW  algorithm.  The  compression 
ratios  achieved  respectively  by  the  LZW  encoder  and  the  RLE 
are  compared.  According  to  our  simulations,  the  threshold 
value  N  must  be  greater  than  10. 


III.  Dynamic  update 

In  the  LZW  algorithm,  the  update  of  the  dictionary  stops 
when  the  dictionary  is  full.  For  example,  in  the  CCITT  V42 
bis  recommendation  [2],  when  the  dictionary  is  full,  the  al¬ 
gorithm  deletes  the  old  dictionary  (  flush  )  and  starts  build¬ 
ing  a  new  one.  The  compression  ratio  decreases  considerably 
after  the  dictionary  flush.  Instead  of  deleting  the  entire  dic¬ 
tionary,  it  is  proposed  to  delete  just  a  section,  namely  all  the 
free  branches  of  the  tree  representing  it.  The  procedure  is  as 
follows.  While  building  the  dictionary,  the  algorithm  marks  all 
the  free  branches,  using  a  one  row  table.  Once  the  dictionary 
is  full,  the  flush  phase  deletes  all  the  branches  already  marked. 
This  technique  keeps  the  very  long  strings,  so  that  the  statist¬ 
ical  properties  of  the  input  are  well  known  and  the  previously 
deleted  branches  are  used  to  continue  the  update.  The  number 
of  free  branches  deleted  in  each  flush  phase  allows  us  to  follow 
the  evolution  of  the  algorithm  and  estimate  if  it  is  better  to 
delete  only  a  part  or  all  the  dictionary.  In  fact,  after  several 
updates,  the  number  of  free  branches  tends  to  become  a  con¬ 
stant  value.  It  corresponds  to  the  saturation  of  the  dictionary. 
At  this  point,  deleting  all  the  dictionary  is  the  best  solution. 

IV.  Results 

The  improvement  in  compression  ratio,  with  the  LZW-RLE 
coder  is  confirmed  by  several  tests  with  standard  files.  It 
provides  better  performance  during  the  learning  phase  with 
less  complexity.  The  improvement  is  around  4  to  6  percent 
with  respect  to  the  LZW  algorithm  alone.  The  dynamic  up¬ 
date  acts  once  the  dictionary  is  full.  It  provides  an  improve¬ 
ment  of  4  percent  with  respect  to  the  V42  bis  method.  The 
flush  threshold  value  seems  to  be  around  1000  free  branches. 

V.  CONCLUSION 

In  this  paper,  two  modified  algorithms  based  on  combining 
the  run-length  encoding  with  (LZW)  and  the  free  branches  de¬ 
leting  method  have  been  analysed  and  simulated.  A  signific¬ 
ant  compression  ratio  improvement  is  achieved  with  the  LZW- 
RLE  coder,  when  repetition  sequences  are  present  in  the  file. 
The  LZW-RLE  coder  does  not  affect  the  compression  ratio  in 
the  case  of  normal  files  (files  without  repetitions).  Application 
to  speech  and  image  coding  leads  to  further  refinements  which 
are  presently  under  study. 
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Abstract  —  We  study  the  universal  coding  problem 
for  the  integers,  in  particular,  establish  rather  sharp 
lower  and  upper  bounds  for  the  Elias  omega  code 
and  other  codes.  For  these  bounds,  the  so-called  log- 
star  function  plays  the  central  role.  Furthermore,  we 
invesigate  unbounded  search  trees  induced  by  these 
codes,  including  the  Bentley- Yao  search  tree. 


1.  Elias  omega  code  and  related  codes 

Let  us  denote  the  standard  binary  expression  of  positive 
integer  j  €  A'+  as  (j) 2.  For  example,  (13)2  =  1101.  The 
binary  expression  of  integer  j  to  base  2k  is  denoted  by  (j)2,k- 
Next  we  express  the  floor  function  of  log  by  A 2(i)  =  [log2  iJ  • 
Moreover,  A2  is  the  fc-fold  composition  of  function  A2. 

Elias[l]  introduced  a  universal  code  w  :  A +  5 ►  {0,1}*, 
called  the  opcode,  described  by 


w(j)  = 


0  ,  if  J  =  1 

(A2fc-1(j))2---(A2(i))2(i)20  ,if  J>2 


(1) 


where  k  —  k(j)  is  the  positive  integer  satisfying  A 2(j)  =  1 
(which  exists  for  any  j  >  2).  Then  the  codeword  length  of 
this  prefix  code  u?  is  given  by 


Then  we  established  upper  and  lower  bounds  for  the  length 
function  ce(J)- 

□  Theorem  1  For  any  real  x  >  1, 

log*(x)  <  ce{x)  <  log*(x)  +  u>*(z).  (6) 

Here  we  have  extended  the  domain  of  function  ce(-)  to  the  set 
of  real  numbers  through  the  extension  of  A2.  Through  a  simple 
consideration,  we  can  check  that  the  upper  bound  is  attained 
at  the  points  jm  =  exp™(l)  (m  =  0, 1, . . .),  where  exp2(x)  = 
2X  and  exp2(x)  is  the  fc-hold  composition  of  function  exp2(«). 
Moreover,  the  lower  bound  is  also  attained  at  the  same  points 
in  the  sense  of 

lim  ce(x)  =log2(jim).  (7) 

itim 

Therefore,  the  two  bounds  are  best  possible  as  far  as  we  re¬ 
strict  the  bounding  functions  to  such  smooth  functions. 

We  can  obtain  same  bounds  for  the  codeword  length  of  the 
Stout  code  Sd  by  a  similar  argument. 

Furthermore,  the  unbounded  search  trees  on  induced 
by  the  Elias  omega  code  and  Stout  codes  have  a  more  beautiful 
recursive  structures  than  Bentley- Yao  search  tree[2]. 

3.  Modified  log-star  function 
Define  the  modified  log-star  function  by 


CE(j)  =  ]u>(i)|  =  ^  (A2(j)  +  1)  (j  -  1,2,...).  (2) 


Another  class  of  universal  codes  introduced  by  Stout[3]  is 
given,  for  any  integer  d  >  0,  by 


{  0)2, dO 


Sd  0)  =  < 


(Af£ilW)2Id(A^1W)2... 

•••(Afd](i))2(A[d](i))2(i)20 


for  j  €  A r  =  {0, 1, . . where 


,  if  0  <  j  <  2d, 

(3) 


A[d](*)  =  Ll°g2  -  d  (x  >  0),  (4) 


A[dj  is  the  t-fold  composition  of  the  function  Apq,  and  k  is  the 

positive  integer  satisfying  0  <  A^(j)  <  2d. 

Stout  has  defined  the  code  Sd  only  for  d  >  2.  So  is  identical 
to  the  code  introduced  by  Levenshtein[4]. 


2.  Bounds  for  the  codeword  lengths 

In  order  to  introduce  the  bound  for  CE(j),  we  define  the  log- 
star  function  log2(x)  for  x  >  1  as 

log2(x)  =  log2(x)  +log2  log2(x)  -f - hlog^  (:r)(x)  (5) 

where  log2  (x)  is  the  fc-fold  composition  of  the  function 
log2(x),  and  w*(x)  is  the  largest  positive  integer  satisfying 
log2'(x)  >  0.  Therefore,  w*(x)  =  l,log2(x)  =  0  for  x  =  1. 


log*  Q(x)  =  logr*(x)  -  flfttir*(®)  (x  >  1)  (8) 

for  integer  r  >  2  and  real  number  a.  Then,  we  have 
□  Lemma  1  For  integer  r  >2,  set  a*  =  logr(logr  e). 

1)  If  a  <  a*,  then 

00 

^r-iog*a(j)  <+0o>  (9) 

3= 1 

2)  If  a  >  a*,  then 

00 

^  ^  r~  —  -j-OO.  (10) 

J=1 

From  the  lemma,  we  can  show  the  existance  of  better  prefix 
codes  than  Elias  omega  code,  and  other  known  codes. 
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Abstract  —  The  context  tree  weighting  algorithm 
was  introduced  at  the  1993  ISIT.  Here  we  are  con¬ 
cerned  with  the  context  tree  maximizing  algorithm. 
We  discuss  several  modifications  of  this  algorithm. 


I.  Introduction 

In  this  paper  we  assume  that  the  source  has  a  tree  structure. 
The  context  (e.g.  the  most  recent  symbols  from  the  source 
sequence)  selects  one  of  the  leaves.  Symbols  following  this 
context  are  assumed  to  be  independent.  The  tree  structure  is 
called  the  model  of  the  source.  A  full  tree  with  depth  D  and 
with  symbol  counts  in  its  nodes  and  leaves  is  called  a  context 
tree.  In  [2]  an  one-pass  algorithm,  the  context  tree  weighting 
algorithm  was  introduced.  This  method  uses  such  a  tree. 

For  the  context  tree  weighting  algorithm  it  was  proved  that 
the  individual  redundancy  p  of  a  source  sequence  xj,  with 
respect  to  a  binary  source  with  model  S  and  with  parameter- 
vector  ©£  satisfies  (the  terms  represent  the  model,  parameter 
and  coding  redundancy  respectively): 

p(* f  \4-D,S,  05)  <  (2|<S|  -  1)  +  (l|i  log  +  |<S|)  +  2. 

This  holds  for  every  model  S  and  every  parametervector  ©5. 

The  context  tree  maximizing  algorithm  (see  also  [1]),  a  two- 
pass  algorithm,  fulfils  the  same  upperbound,  but  at  the  same 
time,  it  will  give  a  slightly  longer  codeword.  During  the  first 
pass  the  counts  in  the  tree  will  be  updated.  After  the  first  pass 
the  two-pass  algorithm  will  determine  the  “best”  model,  and 
in  the  second  pass  it  uses  this  model  to  compress  the  sequence. 
Two-pass  algorithms  can  have  distinct  advantages.  Most  im¬ 
portant  is  that  their  decoding  complexity  is  considerably  less 
than  the  complexity  of  the  weighting  algorithm. 


II.  The  context  maximizing  algorithm 

Just  like  the  weighting  algorithm,  this  algorithm  uses  the 
Krichevsky- Trofimov  estimator  for  encoding  memoryless  se¬ 
quences.  This  results  in  the  following  block  probability  for  a 
sequence  with  a  zeros  and  b  ones  (if  a  >  0  and  b  >  0)  : 


Pe(M) 


1  1 

2  '  2 


1  •  2  •  .  .  .  •  (a  +  b) 


In  every  node  of  the  context  tree  we  compute  the  maximized 
probability  according  to  the  following  formula.  With  D  we 
denote  the  maximum  level  of  the  tree,  and  Z(s)  is  the  depth 
of  the  context  in  node  s.  We  define 


f  Pe{as,bs)  if  l(s)  =  D, 

\  |ma x(Pe(a51bs),P^sPis)  if  l(s)  <  D. 


One  can  find  the  model  by  walking  depth-first  through  the 
tree.  If  the  product  of  the  maximized  probabilities  of  the 
children  is  larger  than  the  Pe  in  node  s,  then  s  must  be  an 
internal  node  of  the  model,  else  s  is  a  leaf.  The  maximizing 
algorithm  will  find  a  model  which  minimizes  the  description 
length  (MDL).  The  description  length  is  the  sum  of  the  cost 
needed  to  describe  the  model  (the  factors  |)  and  the  cost  of 
describing  the  data  with  this  model  (Pe). 


III.  The  yoyo  algorithm 

The  maximizing  algorithm  can  be  modified  such  that  it  pro¬ 
duces  a  model  with  not  more  than  C  leaves  (parameters),  to 
limit  the  complexity  of  the  decoder.  We  walk  through  the  con¬ 
text  tree  again  in  a  depth-first  search  way.  In  every  node  we 
compute  a  list  which  contains  for  all  c  =  1,  C  the  maximized 
probability  achievable  with  not  more  than  c  leaves.  In  each 
node  the  list  can  be  computed  by  combining  the  estimated 
probability  in  that  node  with  the  lists  from  its  two  children. 

For  every  total  number  of  leaves  one  looks  for  the  distribu¬ 
tion  of  leaves  over  its  two  children  that  results  in  the  highest 
product  of  the  maximized  probabilities.  Finally  one  finds  a 
list  in  the  root  with  for  every  number  of  leaves  up  to  <7,  the 
corresponding  maximized  probability. 

To  determine  the  list  in  the  root  one  needs  at  most  D  -f  1 
open  lists.  Once  one  knows  the  appropriate  total  number  of 
leaves,  one  knows  which  distribution  of  the  number  of  leaves 
over  each  child  (of  the  root)  resulted  in  this  “optimal”  solu¬ 
tion.  In  this  way  the  problem  is  reduced  to  two  trees  of  depth 
D  ~  1.  If  one  applies  this  technique  recursively,  we  will  find 
the  best  constrained  model. 

IV.  Model  description  on-the-fly 

Instead  of  sending  the  model  description  first,  followed  by  the 
code  for  the  data,  we  now  use  a  growing  model.  The  decoder 
walks  through  the  context  tree  as  far  the  current  model  allows. 
If  the  new  context  passes  an  endpoint  (leaf)  of  the  current 
model,  which  is  not  known  to  be  a  leaf  or  internal  node  of  the 
MDL  model  yet,  and  this  new  context  differs  from  the  previous 
ones  that  have  passed  this  endpoint,  then  the  decoder  needs 
more  information  about  the  model.  We  must  first  tell  him  that 
the  endpoint  is  a  leaf  or  not.  If  not  we  should  give  him  the 
same  information  about  the  next  node  on  the  context  path, 
etc.  This  process  ends  when  the  current  context  diverges  from 
the  previous  ones.  The  diverging  node  must  be  included.  In 
this  way  the  current  model  grows  to  the  MDL-model. 

In  total  the  encoder  now  has  to  describe  all  internal  nodes  of 
the  found  model,  plus  all  leaves  (not  at  the  maximum  depth) 
which  are  followed  by  different  context  sequences. 

With  this  technique  we  gain  compared  to  the  original  two- 
pass  algorithm.  But  the  model  costs  in  the  weighting  algo¬ 
rithm  are  similar.  The  maximizing  algorithms  can  be  modified 
such  that  the  best  “on-the-fly  models”  will  be  found. 

V.  Improved  model  description 

In  binary,  but  especially  in  non-binary  trees,  with  on-the-fly 
model  description,  the  number  of  children  of  a  node  that  need 
specification  is  not  known  in  advance.  Using  an  estimator,  e.g. 
Pe,  to  specify  these  children,  we  get  improved  compression. 
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Abstract  —  This  paper  presents  a  fixed-to-variable 
variation  of  the  Ziv-Lempel  code  called  “FVLZ  code”, 
and  clarifies  its  asymptotic  performance  with  respect 
to  a  non-probabilistic  model  for  constrained  sources 
proposed  by  Ziv  and  Lempel.  It  is  shown  that  the 
FVLZ  code  has  almost  the  same  asymptotic  perfor¬ 
mance  as  the  Ziv-Lempel  code. 

I.  Ziv-Lempel  Coding 

In  1977,  Ziv  and  Lempel  proposed  a  universal  coding  al¬ 
gorithm  called  “LZ77  code”[l].  The  LZ77  coding  algorithm 
parses  input  data  into  a  sequence  of  phrases  with  their  length 
less  than  F,  each  of  which  excluding  the  last  symbol  is  the 
longest  matched  string  in  a  sliding  window  consisting  of  the 
previously  encoded  N  —  F  symbols.  The  phrases  are  rep¬ 
resented  by  the  position  and  length  of  the  longest  matched 
string  in  the  window  as  well  as  the  following  un-matched  sym¬ 
bol,  and  these  triples  are  encoded  into  a  codeword.  Then,  the 
window  slides  to  the  position  just  before  the  next  symbol  to 
be  encoded. 

II.  Description  of  the  Algorithm 
We  begin  with  the  description  of  the  FVLZ  coding  al¬ 
gorithm.  Let  A  be  a  finite  alphabet  set  with  o  elements, 
where  a  >  2.  Let  G%  be  the  ith  FVLZ  codeword  which 
is  obtained  by  concatenating  some  intermediate  codewords 
C[  (1  <  j  <  E(i))  described  later,  i.e.,  Ci  =  C\  . . .  Ci 
Let  d(C{)  denote  the  length  of  the  input  data  encoded  into 
the  jth  intermediate  codeword  Cf,  and  let  l\  -  d(Cf) 

where  l\  =0.  Assume  that  p  indicates  the  number  of  encoded 
symbols.  Then,  the  FVLZ  coding  algorithm  can  be  described 
as  follows: 

Step  1  (Initialization)  Sliding  window  is  initialized  in  the 
similar  manner  as  the  LZ77  coding  algorithm. 

Step  2  (Encoding)  Obtain  the  intermediate  codeword  C\  by 
using  the  LZ77  coding  algorithm  assuming  that  sliding  win¬ 
dow  consists  of  the  previously  encoded  N  —  F  +  l\  symbols 
and  the  length  of  longest  matched  string  is  less  than  F~l\. 
Then,  the  contents  of  C\  are  represented  with  lengths  spec¬ 
ified  in  Table  1.  If  p  mod  F  -  0,  output  the  ith  code¬ 
word  Ci  -  C}Cf  *  •  •  Cf(i\  and  refresh  the  sliding  window 
by  shifting  F  symbols  to  obtain  the  next  FVLZ  codeword. 
Repeat  Step  2  until  the  whole  input  data  is  encoded.  □ 

Table  1:  Lengths  of  intermediate-codewords 


IV.  Analysis 

In  this  section,  we  clarify  an  asymptotic  performance  of  the 
FVLZ  code  and  compare  it  with  that  of  the  LZ77  code[l].  To 
this  end,  we  employ  a  following  model  for  constrained  sources 
which  was  defined  by  Ziv  and  Lempel[l].  Let  A*  denote  the 
set  of  all  strings  of  finite  length  over  A.  Given  a  string  S  €  A* 
of  length  l(S)  and  a  positive  integer  m  <  l(S ),  and  S{m} 
denotes  the  set  of  all  substrings  of  length  m  contained  in  S. 
Given  a  subset  a  of  A*,  and  let  cr{m}  =  {5  €  cMS)  =  m}- 
Assume  that  cr(m)  denotes  the  cardinality  of  cr{m}.  Then,  a 
subset  <j  of  A*  is  called  a  source,  if  the  following  three  prop¬ 
erties  hold:  1)  A  C  cr,  2)  S  6  a  implies  SS  €  a,  3)  S  €  a 
implies  5{m}  C  ^{m}.  With  every  source  <7,  we  associate  a 
sequence  h(  1),  h( 2),  •  •  ■  of  parameters,  called  the  h-parameters 
of  <7,  where  h(m)  =  “  log(<7(ra))  171  =  L  2,  •  •  ••  Let  the  com¬ 

pression  ratio  p  be 

total  length  of  codewords  ^ 

encoded  source  length 


P  = 


Contents  to  be  represented 

Length  A-  .  .  .  | 

A 

Starting  position: 

\\og(N  -  F  +  l}%)]  ® 

Longest  length: 

hog  {F-ny\ 

Last  symbol  (un-matched  symbol): 

flog  a]  E 

aJ 

Now,  we  can  state  the  following  result. 

Theorem  1  If  the  length  of  the  sliding  window  N  for  a 
source  with  known  h-parameters  is  chosen  by  N  =  FMf 

where  MF  =  (F  -  1)  |Sm=i  +  Sl=A+i  a^F  ~  1^}  +  F' 
A  =  [(F  -  1  )h(F-  1)J.  Then,  the  compression  ratio  p  attain¬ 
able  by  the  FVLZ  code  satisfies 

p<h(F-l)  +  e(F),  (2) 

where  6(F)  —  (3  +  log(F  —  1)  +  3  log  F)/(F  —  1).  L] 

By  using  Theorem  1,  we  can  show  the  universality  of  the 
FVLZ  code  in  the  similar  manner  as  Ziv  and  Lempel  did  for 
the  LZ77  code  in  Ref.jl].  Since  e(F)  of  Eq.(2)  is  equal  to  that 
of  the  LZ77  code  up  to  the  coefficient  of  the  highest  order,  we 
can  show  that  the  FVLZ  code  has  almost  the  same  asymptotic 
performance  as  the  LZ77  code  has.  Further,  experimental  re¬ 
sults  reveal  that  the  FVLZ  code  and  the  LZ77  code  provide 
almost  the  same  performance  from  the  viewpoint  of  compres¬ 
sion  ratio  and  encoding/decoding  time,  as  well  as  it  requires 
almost  the  same  amount  of  memory  as  the  LZ77  code. 
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The  sliding  window  of  the  FVLZ  code: 
already  encoded:  B  i(  1 ,8)^ 


to  be  encoded:  Bi(9.16) 


[© 


►C\** 


already  encoded:  Bi(l,12) 


III.  Example 

Fig.l  shows  an  example  of  the  FVLZ  encoding  for  A—  {a,b}, 
JV=16  and  F= 8.  Bz(l,N)  in  Fig.lA  denotes  a  string  in  the 
sliding  window  used  for  the  encoding  of  the  ith  FVLZ  code¬ 
word  Ci  (i.e.,  Bi(t,N)  — abbbbaababbababa). 

1  e-mail:  k-iwata@jaist.ac.jp 


p.  Output  the  /th  codeword  C  ;=C  -C  \ 

The  window  is  shifted  F(=8)  symbols  to  the  right. 


b  j  a  \  b 


to  be  encoded: 
--Bid  3. 16) «- 


b  a 


►Ci- 


[);  ...abbbbaab 


already  encodedB  uid  .8) 


b  b 


to  be  encoded:  Bi+i(9.16) 


:  matched  symbol  O  :  un-matched  symbol 

Figure  1:  Encoding  by  FVLZ  code  with  16  and  F=8. 
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Abstract  —  This  paper  presents  a  new  variation  of 
the  Ziv-Lempel  code  called  “Partial  Decodable  Ziv- 
Lempel  (PDLZ)  code”,  which  can  decode  a  part  of 
the  encoded  data  from  a  sequence  of  codewords. 

I.  Ziv-Lempel  Coding 

In  1977,  Ziv  and  Lempel  proposed  a  universal  coding  al¬ 
gorithm  called  “LZ77  code”[l].  The  LZ77  coding  algorithm 
parses  input  data  into  a  sequence  of  phrases  with  their  length 
less  than  L,  each  of  which  excluding  the  last  symbol  is  the 
longest  matched  string  in  a  sliding  window  consisting  of  the 
previously  encoded  N  —  L  symbols.  The  phrases  are  repre¬ 
sented  by  the  position  and  length  of  the  longest  matched  string 
in  the  window  as  well  as  the  following  un-matched  symbol,  and 
these  triples  are  encoded  into  a  codeword.  Then,  the  window 
slides  to  the  position  just  before  the  next  symbol  to  be  en¬ 
coded.  Now,  we  define  the  matched  relation  as  the  relation 
between  the  zth  symbol  of  the  longest  matched  string  in  the 
sliding  window  and  the  zth  symbol  of  the  parsed  phrase  to  be 
encoded.  It  is  noted  that  if  a\  and  <12  are  in  matched  relation, 
and  <12  and  a 3  are  in  matched  relation,  then  a\  and  az  are  also 
in  matched  relation. 

II.  Description  of  the  Algorithm 


III.  Analysis 

In  this  section,  we  clarify  an  asymptotic  performance  of 
the  PDLZ  code.  To  this  end,  we  employ  a  following  model  for 
constrained  sources  which  was  defined  by  Ziv  and  Lempel[l]. 
Let  A  be  a  finite  alphabet  set  with  a  elements,  where  a  >  2, 
and  A*  denotes  the  set  of  all  strings  of  finite  length  over  A. 
Given  a  string  S  6  A*  of  length  l(S)  and  a  positive  integer 
m  <  /(£),  and  S{m}  denotes  the  set  of  all  substrings  of  length 
m  contained  in  S.  Given  a  subset  a  of  A*,  and  let  cr{m}  = 
{5  €  cr|/(5)  —  m}.  Assume  that  cr(m)  denotes  the  cardinality 
of  a{m).  Then,  a  subset  a  of  A*  is  called  a  source,  if  the 
following  three  properties  hold:  1)  A  C  a,  2)  5  €  <7  implies 
SS  €  cr,  3)  5  €  o  implies  5{ra}  C  cr{ra}.  With  every  source  a, 
we  associate  a  sequence  /i(l),  fr(2),  •  •  •  of  parameters,  called  the 
/i-parameters  of  cr,  where  h(m)  —  T-  log(cr(m))  m  =  1,  2,  •  *  •. 
Let  the  compression  ratio  p  be 


total  length  of  codewords 
encoded  source  length 


(i) 


Now,  we  can  state  the  following  result. 

Theorem  1  Assume  that  for  a  source  with  known  h - 
parameters,  the  length  of  sliding  window  N  is  chosen  by 

r  a  l~ 1  ^ 


N  =  (L-l)\Y^(L  —  ra)am  +  E  ( L  —  m)<j(L  —  1)  >+L, 


where  A  —  [(L  —  1  )h{L  ■ 


m=x  +  l  ) 

1)J.  Further,  let  K  be  specified  by 


For  each  symbol  in  the  sliding  window,  let  a  quotation  sym¬ 
bol  be  the  oldest  symbol  in  matched  relation  with  the  sym¬ 
bol.  Then,  the  quotation  symbol  has  been  encoded  as  an 
un-matched  symbol.  Let  the  quotation  set  be  the  set  of  the 
previously  encoded  K  symbols.  Then,  the  PDLZ  coding  algo¬ 
rithm  can  be  described  as  follows: 

Step  1  (Initialization)  Sliding  window  is  initialized  in  the 
same  manner  as  the  LZ77  coding  algorithm. 

Step  2  (Encoding)  Obtain  the  next  phrase  to  be  encoded 
in  the  same  manner  as  the  LZ77  coding  algorithm.  Then, 
execute  the  following  procedure: 

Case  1:  If  the  quotation  symbols  corresponding  to  the  ob¬ 
tained  phrase,  are  all  in  the  quotation  set,  then  the  phrase 
is  encoded  into  the  ith  codeword  Ci  in  the  same  manner 
as  the  LZ77  coding  algorithm. 

Case  2:  Otherwise,  we  divide  the  obtained  phrase  into 
some  substrings,  such  that  each  last  symbol  in  the  sub¬ 
strings  except  for  the  last  substring  has  the  quotation 
symbol  out  of  the  quotation  set.  Then,  each  substring  is 
encoded  into  the  intermediate  codeword  Cl  j  =  1,2, 
in  the  similar  manner  as  the  LZ77  coding  algorithm,  and 
obtain  Ct  by  concatenating  (7/. 

Refresh  the  sliding  window  to  obtain  the  next  codeword  in 
the  same  manner  as  the  LZ77  coding  algorithm.  Repeat 
Step  2  until  the  whole  input  data  is  encoded.  □ 

Fig.l  shows  an  example  of  the  PDLZ  encoding  for  an  input 
alphabet  set  A={a,b},  JV=12,  L=6  and  K=6.  For  each  sym¬ 
bol  in  the  sliding  window,  the  quotation  window  as  shown 
in  Fig.l(i)  stores  the  position  of  the  corresponding  quotation 
symbol  in  terms  of  the  length  from  the  next  symbol  to  be 
encoded. 

1  e-mail:  k-iwata@jaist.ac.jp 


K  =  {N-L)1+(,  £>0.  (2) 

Then,  the  compression  ratio  p  attainable  by  the  PDLZ  code 

p<{h(L-l)  +  Sl(L)}{l  +  e2(L)},  (3) 

where  £i(L)  =  (3  +  31og(L  —  1)  +  \og(L/2))/(L  —  1)  and 
£2{L)  =  2(/{L-1)2(~1L(.  □ 

Theorem  1  implies  the  following  corollary. 

Corollary  1  If  K  >  (N  —  L)i/3  then  the  PDLZ  code  is  a 
universal  code  in  the  sense  of  Ziv  and  Lempel[l].  □ 
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Figure  1:  PDLZ  encoding  with  N—l 2,  L=6  and  K—6. 
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Abstract  —  This  paper  presents  a  multicarrier  signaling 
technique  for  an  asynchronous  Direct  Sequence  (DS)  Code 
Division  Multiple  Access  (CDMA)  system  which  employs 
linear  convolutional  c  odes  to  achieve  frequency  diversity 
performance  gains  in  excess  of  path  diversity  gains  realized 
m  conventional  single  carrier  RAKE  DS  CDMA  systems. 

I.  Overview 

DS  CDMA  is  a  popular  signaling  technique  in  which  binary  data 
sequences  for  multiple  access  users  are  modulated  by  unique 
spreading  signature  sequences  having  bandwidth  much  greater  than 
that  of  the  data.  Waveforms  are  transmitted  simultaneously  over  the 
same  frequency  band  and  are  distinguished  at  the  receiver  via  a 
correlation  operation  against  the  spreading  code  of  the  user-of- 
interest.  We  consider  a  slowly  varying,  Rayleigh  fading  multipath 
channel,  where  the  spread  bandwidth  exceeds  the  coherence 
bandwidth  of  the  channel,  and,  thus,  the  signals  are  said  to  fade  in  a 
frequency  selective  manner.  In  such  systems,  a  RAKE  receiver  is 
often  employed  to  combine  the  energy  received  over  several 
resolvable  propagation  paths. 

We  present  an  alternative  system  where  the  available  frequency 
bandwidth  is  decomposed  into  M  distinct  sub-bands,  each  of  an 
bandwidth  equal  to  the  coherence  bandwidth  of  the  channel.  The 
sub-channels,  therefore,  tend  to  fade  non-selectively,  and  are 
assumed  to  fade  independently.  In  short,  we  exchange  path 
diversity  for  frequency  diversity,  wherein  forward  error  correction 
may  be  utilized  without  the  penalty  of  bandwidth  expansion. 

II.  Summary 

The  data  sequence  for  a  given  user  is  input  to  a  rate  1/M 
convolutional  encoder  (where  M  is  the  number  of  carriers)  and  each 
of  the  M  outputs  are  multiplied  by  a  spreading  sequence  w’hich,  in 
turn,  modulates  the  M  carrier  tones.  The  receiver  utilizes  coherent 
BPSK  detection  and  weights  the  outputs  of  each  correlator  in  an 
optimum  fashion.  These  outputs  are  then  used  to  calculate  branch 
metrics  in  a  soft  decision  Viterbi  decoder.  Whereas  the 
conventional  DS  CDMA  system  experiences  path  diversity  on  the 
order  of  the  number  of  resolvable  paths,  the  coded  multicarrier  DS 
CDMA  system  experiences  frequency  diversity  on  the  order  of  the 
number  of  carriers  plus  an  effective  diversity  improvement  on  the 
order  of  the  minimum  free  distance  of  the  convolutional  code  [1]. 
The  diversity  gains  realized  make  for  significant  improvements  in 
user  capacity,  while  preserving  the  desirable  properties  exhibited  in 
DS  CDMA  systems:  robustness  to  fading,  tolerance  to  multiple 
access  interference,  and  a  narrowband  interference  suppression 
effect  [2]. 

The  performance  of  the  coded  multicarrier  system  is  compared  to 
that  of  a  conventional  single  carrier  system  in  the  presence  of 
additive  white  Gaussian  noise,  multiple  access  interference,  and 
Gaussian  partial-band  interference.  It  can  be  shown  that  the  outputs 
of  the  M  sub-channel  correlators  are  approximately  conditionally 
Gaussian,  conditioned  on  the  respective  channel  fade  amplitudes 
[3].  We  derive  the  optimal  correlator  weights  and  branch  metrics 
for  the  soft  decision  decoder  using  standard  methods  [1]. 

To  obtain  an  upper  bound  on  the  average  probability  of  bit  error, 
we  assume  that  the  all-zero  path  is  sent  and  consider  the  event  that 
some  competing  path  is  selected.  This  is  accomplished  by 
developing  a  convolutional  code  generating  function  evaluated  in 
terms  of  an  exponential  upper  bound  on  the  probability  of  a 


pairwise  error  event  [1].  Since  the  variances  of  sub-channel 
correlator  outputs  may  be  different,  due  to  partial-band  interference, 
we  consider  the  pairwise  error  event  of  a  competing  path  containing 
precisely  d4  code  bit  errors  in  the  ith  bit  location  (i.e.,  ilh  sub¬ 
channel).  It  can  be  shown  that  the  Chernoff  bound  on  this 
probability  is 

f  l 


p2(du-,du)<  n 


1+7/ 


where  y .  is  the  average  signal -to -noise  ratio  of  the  ih  channel.  It  is 


then  straightforward  to  develop  a  generating  function  for  a 
particular  convolutional  code  which  enumerates  not  just  the  number 
of  code  bit  errors  over  a  path,  but  the  location  (i.e.,  sub-channel)  of 
those  bit  errors,  whereupon  the  probability  of  bit  error  may  be  union 


bounded  as 


dT{D„..,D,i,N) 
b  dN 


N=\,  Dr±r.  i=U..M 
1+7, 


To  analyze  and  compare  these  systems,  we  selected  raised-cosine 
chip  wave-shaping  filters  with  50%  excess  bandwidth.  Single 
carrier  RAKE  system  performance  is  taken  as  equivalent  to  that  of 
4h  order  path  diversity  reception  using  maximal-ratio  combining 
[1].  The  multicarrier  system  employs  4  carriers,  and  thus,  rate  1/4 
codes  of  varying  constraint  lengths  [4].  We  hold  total  system 
bandwidth,  information  rate,  and  energy-per-bit  constant.  The 
figure  below  depicts  the  upper  bound  on  the  BER  for  multicarrier 
systems  as  a  function  of  the  number  of  muliple  access  users  for 
Eb/TVo  fixed  at  12  (dB).  At  a  BER  of  10'3,  significant  capacity  gains 
are  realized  as  an  increasing  function  of  the  code  constraint  length. 
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Abstract  —  The  purpose  of  this  paper  is  to  examine 
the  effects  of  the  power  control  technique,  the  cod¬ 
ing,  and  the  interleaving  depth  on  the  performance  of 
code-division  multiple-access  (CDMA)  systems  with 
different  chip  rates  and  rake  receivers  with  different 
numbers  of  taps.  We  consider  the  implications  of  the 
results  for  the  support  of  voice  and  data  services  in  a 
cellular  CDMA  system. 

Direct-sequence  (DS)  spread  spectrum  CDMA  is  a  leading 
candidate  for  use  in  mobile  cellular  systems  and  personal  com¬ 
munication  systems.  Important  characteristics  of  a  CDMA 
system  include  the  chip  rate,  the  power-control  technique, 
the  forward  error-correction  (FEC)  code,  the  depth  of  code¬ 
symbol  interleaving,  and  the  number  of  taps  in  the  rake  re¬ 
ceiver.  Though  the  development  of  emerging  CDMA  systems 
has  focused  primarily  on  the  support  of  voice  communications, 
the  increasing  demand  for  packet  data  services  points  to  the 
need  for  systems  that  efficiently  support  both  voice  and  data 
traffic. 

The  effect  of  near-far  interference  [1]  in  a  cellular  CDMA 
system  can  be  reduced  by  adapting  the  power  of  each  trans¬ 
mitter  to  the  channel  response  or  the  interference  environ¬ 
ment.  In  a  full-duplex  voice  connection,  the  forward  (base 
station  to  mobile)  link  can  serve  in  part  as  a  feedback  channel 
for  power-control  commands  from  the  base  station.  This  is 
referred  to  as  closed-loop  power  control  We  consider  the  ef¬ 
fect  of  feedback  delay  on  the  performance  of  a  CDMA  system 
with  closed-loop  power  control,  and  the  effect  is  examined  for 
several  channels  and  for  systems  of  different  chip  rates  and 
different  numbers  of  taps  in  the  rake  receiver. 

In  contrast,  data  traffic  on  the  reverse  link  is  likely  to  be 
bursty.  In  many  instances,  it  is  not  practical  to  provide  feed¬ 
back  during  the  transmission  of  a  data  packet.  As  a  result,  the 
mobile  must  determine  a  priori  the  appropriate  power  level  for 
the  entire  packet.  This  is  referred  to  as  average  power  control. 
Some  compensation  for  rapid  fading  can  be  obtained  by  using 
FEC  coding  together  with  interleaving  as  a  form  of  time  diver¬ 
sity.  We  consider  the  effect  of  coding  and  interleaving  on  the 
performance  of  a  CDMA  system  with  average  power  control, 
and  we  examine  its  effectiveness  for  different  channels  and  for 
systems  with  different  chip  rates  and  different  numbers  of  rake 
receiver  taps. 

The  ability  of  the  receiver  to  resolve  multipath  components 
of  the  received  signal  depends  on  the  chip  rate  of  the  DS  sig¬ 
nal.  Our  channel  model  reflects  this  phenomenon  and  allows 
for  tractable  analysis  of  receiver  performance.  Each  chan¬ 
nel  is  a  special  case  of  the  Gaussian  wide-sense-stationary 
uncorrelated-scattering  channel,  and  it  is  described  in  detail 
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in  [2].  A  closed- from  expression  is  derived  in  [3]  for  the  proba¬ 
bility  of  error  at  the  input  to  the  decoder  for  a  CDMA  system 
that  employs  closed-loop  power  control  and  rake  reception. 
The  performance  of  the  system  is  assumed  to  be  limited  by 
multiple- access  interference,  and  the  composite  interference 
is  modeled  as  additive  white  Gaussian  noise.  The  result  is 
employed  here  to  determine  two  quantities  of  interest  -  the 
spectral  efficiency  [3]  of  the  cell  and  the  average  signal-to- 
interference  ratio  (SIR)  that  is  required  to  achieve  a  target 
error  probability.  For  a  given  traffic  mix  and  collection  of 
channels,  the  relationship  between  required  SIR  and  spectral 
efficiency  varies  with  the  chip  rate,  the  power-control  feedback 
delay,  the  FEC  code,  and  the  interleaving  depth. 

In  contrast  to  the  result  in  [3],  no  closed-form  expression 
can  be  obtained  for  the  probability  of  error  at  the  output  oi  the 
decoder.  Chernoff-bound  techniques  can  be  used  to  evaluate 
the  performance  of  coding  and  finite  interleaving  depth,  but 
the  bounds  fail  to  converge  for  many  circumstances  of  inter¬ 
est  in  mobile  communications.  Thus,  we  employ  simulations 
to  examine  the  effect  of  persistent  fading  on  the  performance 
of  CDMA  systems  with  average  power  control,  convolutional 
coding,  and  interleaving.  The  receiver  that  is  considered  em¬ 
ploys  Viterbi  decoding. 

We  have  obtained  numerical  results  for  many  circumstances 
that  are  encountered  in  mobile  communications.  It  is  shown 
that  the  performance  of  a  CDMA  system  with  a  low  chip  rate 
is  more  sensitive  to  channel  and  system  parameters  than  is 
a  CDMA  system  with  a  high  chip  rate.  The  rake  receiver  is 
necessary  for  adequate  performance  of  a  low-chip-rate  system 
under  many  more  circumstances  than  for  a  high-chip-rate  sys¬ 
tem.  The  best  choice  of  chip  rate  for  a  system  with  closed-loop 
power  control  depends  on  the  ratio  of  the  maximum  Doppler 
spread  to  the  feedback  delay,  and  it  also  depends  on  the  allow¬ 
able  number  of  taps.  For  a  CDMA  system  employing  average 
power  control,  coding,  and  interleaving,  a  high  chip  rate  pro¬ 
vides  performance  superior  to  that  of  a  low  chip  rate  in  most 
circumstances. 
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Summary -  Given  an  ordered  J  group  partition  of  the 
K  simultaneously  transmitting  users  of  a  CDMA  channel, 
a  sequential  group  detector  consists  of  J  group  detectors 
that  are  connected  sequentially.  The  jth  group  detector 
uses  the  decisions  from  the  previous  j  —  1  group  detec¬ 
tors  and  cancels  the  inter-user  interference  from  those 
users  before  it  makes  joint  decisions  for  the  jth  group. 
This  successive  interference  cancellation  scheme  was  in¬ 
troduced  in  [1]  for  the  Gaussian  CDMA  channel.  This 
paper  consists  of  extending  that  idea  to  the  Frequency- 
Selective  Rayleigh  Fading  (FSRF)  CDMA  channel  (de¬ 
scribed  in  [2]).  The  two  group  detectors  (I  and  II)  in¬ 
troduced  in  [2]  for  the  FSRF-CDMA  channel  are  consid¬ 
ered  as  the  basic  building  blocks.  The  resulting  sequen¬ 
tial  group  detectors  can  be  regarded  as  members  of  two 
distinct  classes  (each  class  parametrized  by  the  ordered 
partition)  of  multiuser  detectors  that  satisfy  a  wide  range 
of  complexity  constraints.  In  particular,  each  of  the  two 
sequential  group  detectors  has  a  time  complexity  per  sym¬ 
bol  (TCS)  of  0(Y^j= i  MKj)  for  M-ary  signalling,  where 
Kj  is  the  jth  group  size.  The  optimum  multiuser  de¬ 
tector  has  a  fixed  TCS  of  0(MK).  The  simplest  case 
corresponds  to  the  degenerate  ordered  partitions  consist¬ 
ing  of  K  groups  of  size  1  each.  For  this  choice,  the  two 
sequential  group  detectors  reduce  to  two  distinct  decor- 
relating  decision  feedback  detectors.  These  special  cases 
can  be  seen  as  two  distinct  generalizations  (to  the  FSRF- 
CDMA  channel)  of  the  multiuser  detector  by  the  same 
name  for  the  Gaussian  channel  that  was  proposed  in  [3] 
and  for  which  the  analysis  can  be  found  in  [1],  A  succinct 
indicator  of  the  average  BER  over  high  SNR  regions  for 
the  FSRF-CDMA  channel  is  defined  via  the  asymptotic 
efficiency  in  [2].  In  this  work,  upper  and  lower  bounds  on 
the  asymptotic  efficiency  for  the  two  sequential  group  de¬ 
tectors  are  derived.  Minimax  criteria  under  which  these 
detectors  are  optimal  are  specified.  The  following  numer¬ 
ical  example  illustrates  the  vast  improvements  achievable 
by  the  sequential  group  detector  based  on  the  group  de¬ 
tector  II  of  [2]  over  the  detector  proposed  in  [4]. 

Numerical  Example-  Consider  the  six  user  direct- 
sequence  spread-spectrum  system  employing  Gold  se¬ 
quences  of  length  31  of  [2]  operating  in  a  fading  multipath 
environment  with  four  paths  for  each  user.  The  users 
are  numbered  according  to  decreasing  average  power 
ratios  (with  respect  to  the  minimum  power)  given  in 
order  as  [10.0,2.5,2.0,1.5,1.25,1.0].  Suppose  that  the 
performance  of  a  single-user  RAKE  receiver  for  the 
last  (weakest)  user  in  the  hypothetical  single-user  sce¬ 
nario,  where  all  the  other  users  are  absent,  is  con¬ 


sidered  acceptable.  Equivalently,  the  effective  SNR 
(ESNR)  to  minimum  actual  SNR  ratio  (henceforth  re¬ 
ferred  to  relative  ESNR)  for  every  user  has  to  be  no 
less  than  1.  The  linear  suboptimum  detector  of  [4]  re¬ 
sults  in  relative  ESNRs  for  the  six  users  given  in  or¬ 
der  as  [1.40,0.37,0.53,0.21,0.18,0.26].  As  for  sequen¬ 
tial  group  detection,  it  turns  out  that  the  decorrelat- 
ing  decision-feedback  detector  suffices.  The  resulting  up¬ 
per  bounds  for  the  relative  ESNRs  for  the  six  users  are 
[1.40,1.22,1.12,1.06,1.09,1.0]  and  the  lower  bounds  are 
[1.40,1.22,1.12,1.06,1.06,1.0].  The  minimum  specifica¬ 
tion  is  met.  Moreover,  note  that  the  upper  and  lower 
bounds  coincide  for  all  but  the  fifth  user.  Now  suppose 
that  the  power  ratios  are  made  less  disparate  by  reduc¬ 
ing  them  for  the  odd-numbered  users  so  that  the  power 
distribution  for  the  six  users  is  [2.5,  2.5, 1.5, 1.5, 1.0, 1.0]. 
The  relative  ESNRs  for  the  six  users  for  the  linear 
suboptimum  detector  are  [0.35,  0.37, 0.39,  0.21, 0.15, 0.26]. 
The  upper  bounds  for  the  relative  ESNRs  for  the  six 
users  for  the  decorrelating  decision  feedback  detector  are 
[0.35, 1.22, 0.84, 1.06,  0.87, 1.0]  with  a  lower  bound  of  0.35 
for  all  the  users.  The  wide  gap  between  the  upper  and 
lower  bounds  for  users  2  to  6  suggests  that  error  propa¬ 
gation  is  a  severe  problem  inspite  of  the  users  being  ar¬ 
ranged  in  decreasing  order  of  powers.  However,  consider 
a  sequential  group  detector  with  an  ordered  group  par¬ 
tition  {1, 2}{3, 4}{5, 6}  consisting  of  three  groups  of  size 
two  each.  In  this  case,  relative  ESNRs  for  the  six  users 
are  equal  to  [1.16,1.22,1.115,1.114,1.0,1.0]  (the  upper 
and  lower  bounds  coincide  for  every  user  in  this  case). 
The  minimum  requirement  is  thus  met.  Moreover,  er¬ 
ror  propagation  has  little  effect  on  the  performance.  The 
lower  complexity  of  the  sequential  group  detector  however 
is  achieved  at  the  expense  of  approximately  3  dB  loss  for 
the  strongest  users  and  nearly  1.25  dB  for  the  users  of 
intermediate  power  relative  to  the  optimum  detector. 
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Abstract  —  We  develop  an  adaptive  interference 
suppression  scheme  for  DS-CDMA  systems  in  the 
presence  of  impulsive  noise.  This  scheme  is  realized 
by  deriving  an  IPA  based  stochastic  gradient  algo¬ 
rithm  that  minimizes  the  average  probability  of  error 
for  such  systems.  The  resulting  detector  outperforms 
the  conventional  matched  filter  detector  for  such  sys¬ 
tems. 

I.  Introduction 

Recently,  there  has  been  much  work  done  on  deriving  adap¬ 
tive  linear  detectors  for  DS-CDMA  systems  corrupted  by  ad¬ 
ditive  Gaussian  noise  ([1]  and  the  references  within).  How¬ 
ever,  such  communication  systems  are  often  interfered  with 
by  noises  other  than  the  classical  white  Gaussian  noise,  and 
in  here  we  consider  DS-CDMA  systems  corrupted  by  natural 
impulsive  noise  sources,  such  as  those  found  in  low-frequency 
atmospheric  channels,  and  for  channels  corrupted  by  man¬ 
made  impulsive  sources  such  as  those  occurring  in  urban  or 
military  radio  networks.  The  conventional  correlation  receiver 
has  been  shown  to  experience  a  degradation  in  performance 
in  impulsive  noise  (relative  to  the  Gaussian  noise  model)  even 
when  the  user’s  codes  are  chosen  to  have  low  cross-correlations 
[2].  On  the  other  hand,  when  the  multiple  access  interference 
dominates,  the  linear  correlator  in  the  impulsive  noise  channel 
is  not  near  near-far  resistant  (similar  to  the  Gaussian  Chan¬ 
nel).  In  this  paper,  we  develop  an  adaptive  linear  detector, 
for  such  impulsive  noise  channels,  which  directly  minimizes 
the  average  probability  of  bit-error.  The  approach  is  similar 
to  that  used  in  [3],  where  we  develop  an  infinitesimal  pertur¬ 
bation  analysis  (IPA)  based  stochastic  gradient  algorithm  for 
achieving  minimum  probability  of  bit-error.  The  adaptive  in¬ 
terference  rejection  scheme  is  shown  to  have  a  very  simple  re¬ 
cursive  structure  (thereby  by  allowing  easy  implementation), 
and  the  conditions  for  convergence  of  this  algorithm  are  pre¬ 
sented. 

II.  System  Description 

We  will  consider  a  A-user  DS-CDMA  system  where  the  re¬ 
ceived  signal  in  the  channel  is  the  sum  of  the  transmissions 
due  to  the  K  users  and  additive  channel  noise.  The  received 
signal  due  to  the  transmission  of  the  kth  user  at  any  receiver 
is  given  as 

cc 

pk{t)  =  AA  hi'k _  iT  _  rO  c°s(we*  +  <pk), 

i  =  —  OO 

where  6,. a  €  {  — 1,-hl}  is  the  ith  bit  of  the  kth  user,  T  is  the 
bit-period,  P*,  and  r*  are  the  power,  carrier  phase  and 
delay  of  the  kth  user,  respectively.  The  carrier  frequency  is 
denoted  by  t oc,  and  dk(t)  is  the  spreading  waveform  of  the  kth 
user.  The  received  signal  in  the  channel  due  to  the  A  users 
and  additive  noise  is  given  as 

i< 

r(t)  =  y>(t)  +  q(Q, 

k  =  1 


where  r](t)  is  assumed  to  be  the  additive  impulsive  noise  that 
is  characterized  by  the  first  order  probability  density  function 

fq[n)(x)  =  (1  ~  t)fn(x)  +  efl(x), 

where  e  €  [0,1],  and  fn  and  fi  are  pdf’s  [2].  The  nominal 
density  function  fn  is  usually  taken  to  be  a  Gaussian  density 
representing  the  background  noise.  The  impulsive  component 
of  the  noise  is  represented  by  fi  which  is  taken  to  be  more 
heavily  tailed  than  fn.  The  above  model  represents  an  approx¬ 
imation  to  the  canonical  Class  A  interference  model  studied 
by  Middleton  and  Spaulding. 

III.  Adaptive  Interference  Suppression 

In  [2]  a  conventional  correlator  was  used  for  detection  of  the 
desired  user’s  bits.  We  are  interested  in  finding  the  best  set 
of  correlation  sequences  h,  such  that  the  average  probability 
of  bit-error  is  minimized.  Following  the  approach  in  [3],  we 
develop  an  IPA  based  stochastic  gradient  algorithm  that  yields 
the  optimum  linear  detector  for  this  system.  Therefore,  we 
require  the  vector  hm  such  that 

ll  =arg{minPe}.  (1) 

h 

We  now  adaptively  update  the  vector  h,  using  a  stochastic 
algorithm  given  as 

!li+i  =  hi  “  7  a.  Re  (2) 

where  the  gradient  VyPe  is  estimated  using  infinitesimal  per¬ 
turbation  analysis,  and  $r  is  the  vector  of  transmitted  signals 
for  the  ith  iteration,  i.e.,  the  ith  bit  period.  It  is  shown  that 
the  algorithm  allows  a  very  simple  recursive  structure  owing  to 
the  analyticity  of  the  Q  function.  The  conditions  for  conver¬ 
gence  of  the  algorithm  are  presented  as  well.  The  performance 
of  the  adaptive  linear  detector  is  seen  to  be  uniformly  better 
than  that  of  the  linear  correlator  even  under  the  extreme  cases 
when  either  the  multiple  access  interference  or  the  impulsive 
noise  is  dominant. 
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Abstract  —  We  examine  the  performance  of  convolu¬ 
tional  coded  DS/SSMA  communication  system  with 
error-and-erasure  decoding  in  AWGN  and  multipath 
Rayleigh  fading  channel.  The  demodulator  makes 
a  three-level-decision  {-1,1,?}  based  on  the  channel 
state  information(CSI),  where  ?  represents  an  era¬ 
sure.  The  CSIs  considered  are  the  matched  filter  out¬ 
put  and  the  fading  amplitude.  The  optimum  decision 
threshold  that  minimizes  BER  is  found  to  be  almost 
equal  to  the  threshold  that  maximizes  the  channel 
cut-off  rate  Rq.  A  simple  parallel  decision  scheme  is 
proposed  to  give  a  performance  which  is  very  close 
to  the  optimum  decision  scheme.  The  performance 
improvement  made  by  using  the  CSI  is  investigated. 

I.  Introduction 

It  is  well  known  that  soft  decision  decoding  requires  2-3dB 
less  in  signal-to-noise  ratio  over  the  hard  decision  decoding 
in  AWGN  channels  [1].  However,  soft  decision  decoding  re¬ 
quires  real  arithmetic  operations,  which  are  much  more  com¬ 
plex  than  binary  operations  involved  in  hard  decision  decod¬ 
ing.  Clark  and  Cain  has  pointed  out  that  erasing  unreliable 
symbols  based  on  channel  state  information  (CSI)  and  per¬ 
forming  error-and-erasure  decoding  is  an  effective  method  to 
provide  a  good  trade  off  between  system  performance  and 
implementation  complexity [2].  In  this  paper  we  analyze  the 
performance  of  convolutional  coded  DS/SSMA  communica¬ 
tion  systems  employing  error-and-erasure  decoding  and  binary 
PSK  modulation  with  several  demodulation  schemes. 

II.  Demodulation  Schemes 

We  consider  several  demodulation  schemes  that  make  a 
three-level-decision  {-1,1,?}  based  on  the  CSI.  In  AWGN  chan¬ 
nel,  we  use  the  matched  filter  output  as  a  CSI,  which  is  most 
convenient,  useful,  and  easy  to  get.  If  the  absolute  value  of  the 
matched  filter  output  is  larger  than  a  threshold,  the  demodula¬ 
tor  makes  a  decision  {-1,1}  based  on  the  matched  filter  output, 
otherwise  the  demodulator  erases  the  corresponding  symbol. 
In  multipath  Rayleigh  fading  channel  we  use  the  matched  fil¬ 
ter  output  and/or  the  fading  amplitude  as  a  CSI.  For  the  case 
of  demodulator  using  only  the  fading  amplitude,  we  consider 
a  demodulator  that  makes  a  hard  decision  if  the  fading  ampli¬ 
tude  is  larger  than  a  threshold,  otherwise  erases  the  symbol. 
We  assume  the  fading  amplitude  information  is  available  at 
the  demodulator.  We  also  consider  a  demodulator  that  uses 
both  the  matched  filter  output  and  the  fading  amplitude.  In 
this  case  if  the  fading  amplitude  is  below  a  threshold(Qc)  or 
the  matched  filter  output  is  below  a  threshold(n),  the  de¬ 
modulator  erases  the  symbol,  otherwise  makes  a  hard  decision 
based  on  the  matched  filter  output. 


Fig.  1:  BER  vs.  E^/Nq  :  code  rate  =  1/2,  constraint  length  = 
7,  convolutional  code,  number  of  user  =  30,  128  chips/ coded  bit , 
Rayleigh  fading  channel  (a 2  =  1/2) 

III.  Discussions 

We  have  investigated  the  optimum  erasure  threshold  that 
minimizes  BER.  It  is  found  that  the  erasure  threshold  that 
maximizes  the  channel  cut-off  rate  Ro  is  almost  optimal  and 
the  optimum  erasure  threshold  increases  as  the  traffic  in¬ 
creases.  Based  on  this  observation,  we  propose  a  simple  paral¬ 
lel  decision  scheme  that  changes  the  erasure  threshold  accord¬ 
ing  to  the  channel  traffic.  We  found  that  the  parallel  scheme 
yields  a  performance  close  to  that  with  the  optimum  decision 
scheme.  We  have  also  examined  how  much  the  performance 
improvement  can  be  made  by  using  the  CSIs  in  Rayleigh  fad¬ 
ing  channel.  Fig.l  shows  the  BERs  with  different  CSIs.  We 
can  see  that  the  erasure  based  on  the  fading  amplitude  in¬ 
formation  alone  gives  a  higher  BER  than  with  matched  filter 
output  alone.  However,  when  the  fading  amplitude  informa¬ 
tion  is  combined  with  the  matched  filter  output,  the  fading 
amplitude  information  gives  a  gain  of  1.0-2.5dB  in  Eb/No  at 
the  BER  of  10"3,  and  even  a  higher  gain  can  be  obtained  for 
lower  BER. 
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Abstract  —  The  performance  is  evaluated  for  a 
hypothetical  CDMA  digital  cellular  telephone  system 
whose  reverse  link  uses  7r/4 -DQPSK  modulation  and 
equal  gain  RAKE  diversity  combining.  The  results  are 
shown  numerically  in  comparison  with  those  for 
Qualcomm’s  (IS-95)  CDMA  cellular  system,  which  uses 
64-ary  orthogonal  modulation  on  the  reverse  link. 

INTRODUCTION 

The  mobile-to-base  (reverse)  link  of  the  North  Ameri¬ 
can  IS-95  DS-CDMA  cellular  system  employs  an  M-ary 
orthogonal  modulation  using  Walsh-Hadamard  se¬ 
quences  with  QPSK  phase  coding  with  M  =  64.  For 
lack  of  the  carrier  phase  reference-providing  pilot 
signals,  which  are  used  for  the  forward  (base-to-mobile) 
links,  the  system  employs  noncoherent  demodulation 
for  the  reverse  links  for  each  of  the  T-path  diversity 
receptions  in  its  RAKE  system. 

In  this  paper  we  suggest  and  investigate  an 
alternative  scheme  for  the  reverse  link  modulation  and 
multipath  receptions:  the  information  sequence  is  to  be 
7r/4  —  DQPSK  modulated  after  inserting  the  DS-CDMA 
spreading  sequence  in  the  I  and  Q  channels  prior  to  the 
pulse  shaping  and  summation,  and  the  demodulation  is 
done  with  partially  coherent  differential  detection  for 
each  multipath  component  before  diversity  combining. 

We  show  a  closed  form  error  probability  expression 
for  the  7r/4  — DQPSK  reverse  link  with  L  independent 
multipath  diversity  receptions  in  Rayleigh  fading  and 
CDMA  interference.  The  results  are  evaluated  with 
capacity  and  processing  gain  values  as  parameters.  The 
exact  closed  form  expression  for  the  performance  of  the 
system  is  based  on  the  authors’  previous  work  [1]. 

SUMMARY  OF  ANALYTICAL  RESULTS 


where 
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and  where  M  is  the  number  of  multiple  access  users, 
PG  is  the  spread  spectrum  processing  gain,  F  is  the 
frequency  re-use  factor,  d  is  a  voice  activity  factor,  and 
Gs  is  the  sector  antenna  gain. 


For  a  receiver  combining  the  L  >  1  paths,  assuming 
independent  fading  in  the  multiple  paths,  the  prob¬ 
ability  of  error  can  be  shown  to  be 

pd*)  =  H4  ■  -  P2(e)f  •  (3) 

The  values  of  d,  Af,  F,  PG,  and  Gs  determining  the 
amount  of  interference  can  be  traded  off  to  achieve  the 
desired  system  performance.  The  corresponding  error- 
expression  for  the  diversity  combining  of  M-ary 
orthogonal  signals  is  [2] 

_  M/2  1  v^Vm-I 

T(L)  n 


(1  +n+npL)L 


«(£-!)  /  i  ,  „  \fc 

(1) 

in  which  the  value  of  /?fc(n)  is  the  coefficient  of  xk  in 

the  expansion  „ 

F  (L- 1  k\n  n(L-l) 

Ef  =  E  /**(»)«*•  (5) 

\k= 0  /  k= 0 


It  is  found  that,  in  the  fading  environment,  the  pro¬ 
posed  reverse  link  modulation,  which  is  less  complex 
than  the  IS-95  64-ary  orthogonal  modulation,  actually 
outperforms  the  latter  when  L  <  2  paths  are  combined. 
We  consider  examples  of  capacity  and  processing  gain, 
and  implications  for  the  error  correcting  codes  needed 
for  meeting  operational  requirements  for  voice  or  data. 


Resolution  of  multipath  signal  components  separated 
in  time  delay  by  more  than  the  chip  period  of  the  DS- 
CDMA  SS  sequence  is  possible,  and  the  paths  can  be 
combined  to  provide  diversity.  The  unconditional  prob¬ 
ability  of  error  for  the  reception  of  one  of  L  fading 
multipath  components  is  found  to  be 


P2(e)  =  2  1 


PL  cos,  (it/ 4) 

~L  cos  (tt/4)]2  +  ] 
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Abstract  —  In  this  paper,  the  average  interference 
parameter  (AIP)  of  polyphase  code-sequences  in  DS- 
CDMA  systems  is  investigated.  The  expected  value 
and  the  variance  of  the  AIP  for  randomly  chosen 
cyclic  shifts  of  the  code-sequences  are  derived. 

I.  Introduction 

The  performance  of  direct-sequence  code- division  multiple- 
access  (DS-CDMA)  systems  depends  on  the  correlation  prop¬ 
erties  of  the  used  code-sequences.  The  most  common  criteria 
to  describe  the  correlation  behavior  are  the  periodic  peak  cor¬ 
relation  parameter,  describing  the  worst-case  behavior,  and 
the  average  interference  parameter  (AIP)  on  which  the  signal- 
to-noise  ratio  depends.  Usually,  families  of  code-sequences  for 
these  systems  are  constructed  considering  the  periodic  peak 
correlation  parameter.  In  a  second  step,  cyclic  shifts  of  these 
sequences  which  result  in  an  optimum  AIP  are  sought.  (The 
periodic  peak  correlation  parameter  remains  unchanged  if  the 
sequences  are  cyclically  shifted.)  Since  not  all  combinations  of 
shifts  can  be  examined,  simplified  search  methods,  e.g.  based 
on  the  sidelobe  energy,  are  applied.  To  compare  the  perfor¬ 
mance  of  these  techniques  and  to  derive  bounds  on  the  achiev¬ 
able  AIP,  the  expected  value  and  the  variance  of  the  AIP  for 
randomly  chosen  shifts  are  needed.  For  this  reason,  these  val¬ 
ues  will  be  derived  for  some  of  the  most  important  families  of 
code-sequences. 

II.  Investigated  Families 

We  consider  families  T  —  {Sk(n)  |  1  <  k  <  K}  consisting 
of  K  sequences  of  length  N  (0  <  n  <  N  —  1).  The  elements 
of  the  sequences  are  roots  of  unity.  The  aperiodic  correlation 
function  is  defined  by  Csr(t n)  =  Y2n=o  ™  S*(n)R(n  +  m) 
(m  >  0)  and  the  periodic  correlation  function  by  Csr(tti)  — 
S*(n)R(n  +  m  mod  TV).  The  maximum  magnitude  of 
the  periodic  crosscorrelation  values  and  the  autocorrelation 
sidelobes  is  the  periodic  peak  correlation  parameter  0. 

Three  types  of  large  families  of  code-sequences  have 
been  investigated:  Prime-phase  code- sequences  (e.g.  Gold-, 
K  as  ami-,  and  Kumar-  Moreno-families)  are  constructed  in  the 
Galois-field  GF(pr)  using  an  additive  character  [1].  Because 
of  their  practical  importance,  quadriphase  code-sequences  and 
other  prime-power-phase  sequences  are  considered.  The  con¬ 
struction  of  these  sequences  in  Galois-rings  GR(pa ,  r)  is  de¬ 
scribed  in  [2].  The  third  family  is  constructed  by  multiplying 
all  sequences  of  families  of  type  1  or  2  with  exp(j2Kkn/N) 
with  k  —  0 ...iV  —  1. 

III.  Interference  Parameters 

We  consider  an  asynchronous  phase  shift  keying  DS-CDMA 
system  for  K  users.  The  signal-to-noise  ratio  of  these  systems 
can  be  expressed  in  terms  of  the  total  interference  parameter 
TIP  [3]  which  is  defined  as  (at  receiver  i) 

TIP;  =  l/(3iV3)  2fisisk  (0)  +  fiSisk (1), 


where  fiSiSk(l)  =  Re  (TE-jv  Cstsk(i')Csisi(i'  +  0)  •  Usu- 
ally,  the  average  interference  parameter  AIP  =  2/^(0)  +  ^/(l) 

with  /i(t)  -  l/(Ii{K  -  1))ET  T,k=i,k*ilisisk(t} is  used  as 
measure  for  the  average  system  performance.  To  simplify  the 
notation,  we  define  the  sum  <S(jr,  v)  ~  y,Se:F  Css(v)- 

IV.  Expected  Values  of  the  AIP 

Since  the  AIP  depends  on  the  cyclic  shift  of  the  code¬ 
sequences,  we  suppose  that  the  cyclic  phase  of  the  sequences  is 
picked  at  random  with  each  of  the  shifts  being  equally  likely 
to  be  chosen  [4].  Then,  the  expected  value  of  /u(0)  can  be 
derived:  £[p(0)]  = 

N- 1 

+  N2 M{M  —  1)  E ^  *0  -  ^ 

J/=l 

The  expected  value  of  p(l)  and  the  variance  of  the  AIP 
can  be  expressed  in  terms  of  other  sums  (S(F,  v,  v  +  /)  = 
y  C%z(v)Css(v  +  /))•  These  sums  have  been  derived 
for  all  families  described  in  section  II,  too. 

V.  Results 

We  have  investigated  families  of  size  M  ~  (N  +  1)*  with 
t  >  1.  Typical  periodic  peak  correlation  parameters  6  are 
0  <  2ty/N  +  1  +  1  or  0  <  2 t~1^2y/N  +  1  +  1  if  binary  sequences 
are  considered  or  0  <  ty/N  +  1  +  1  for  sequences  with  larger 
phase  alphabet.  In  both  cases,  we  found  E(/u(l))  <<  E(^(0)) 
and  hence 

E(AIP)  «  2E(ju(0))  =  2 N2  -  |  (2  Y  ^'|  ~-1-t 

Obviously,  E[AIP]  does  not  depend  on  the  size  of  the  phase- 
alphabet  and  is  nearly  independent  of  the  size  of  the  family. 
For  the  described  linear  families,  the  E[AIP]  becomes  2 N2 
-  the  expected  value  for  random  sequences  -  if  A7  tends  to 
infinity.  For  the  variance,  we  found  noticeable  differences  de¬ 
pending  on  the  investigated  families.  Using  these  results,  the 
known  numerical  results  on  the  AIP  of  linear  code-sequences 
for  different  selection  criteria  of  cyclic  shifts  (e.g.  LSE/AO, 
MSE/AO)  can  be  explained.  Moreover,  bounds  on  the  achiev¬ 
able  AIP  for  all  linear  families  are  derived. 
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Abstract  —  In  this  paper  we  present  the  Wavelet  Or¬ 
thogonal  Frequency  Division  Multiplexing  (WOFDM) 
that  in  conjunction  with  Frequency  Hopping  can  be 
used  for  Synchronous  Code  Division  Multiple  Access 
(FH/S-CDMA).  A  Low  Probability  of  Intercept  (LPI) 
modulation  scheme  based  on  a  pseudo  random  selec¬ 
tion  of  basis  functions  for  modulation  spanning  the 
same  frequency  channel  is  also  described. 

I.  Introduction 

In  [1]  the  use  of  scaling  functions  and  wavelets,  multiplicity- 
M  wavelets  and  wavelet  packets  to  modulate  different  infor¬ 
mation  signals  on  adjacent  channels  with  overlapping  spectra 
was  proposed.  In  this  paper  we  demonstrate  an  application  of 
this  technique  for  multiple  access  communication  [2], 

The  envisioned  Frequency-Hopped  Synchronous  Code  Di¬ 
vision  Multiple  Access  (FH/S-CDMA)  scheme  is  for  a  multi¬ 
point  to  point  fully  synchronized  communication.  In  this  envi¬ 
ronment,  wavelets  provides  a  great  flexibility  in  controlling  the 
data  rate  and  hence  the  power,  making  the  proposed  CDMA 
scheme  inherently  adaptive. 

II.  Orthogonal  Frequency  Channelization 

The  basic  techniques  to  subdivide  a  given  frequency  band 
into  orthogonal  subchannels  spanned  by  basis  functions  de¬ 
rived  from  the  scaling  functions  and  wavelets  are  described 
in  [1].  This  defines  the  WOFDM  modulation  scheme,  which 
possesses  the  following  characteristics:  (1)  orthogonal  chan¬ 
nels  are  spanned  by  translates  of  a  single  envelope  function. 
The  translation  step  size  is  directly  related  to  the  BandWidth 
(BW)  of  the  subchannel;  (2)  the  channels  overlap  in  frequency 
but  remain  orthogonal  with  proper  synchronization;  (3)  there 
is  great  flexibility  in  how  the  available  BW  is  channelized,  and 
this  channelization  has  a  tree  structure.  It  is  therefore  possi¬ 
ble  to  accommodate  variable  rate  data  modulation  by  routing 
data  to  different  nodes  of  the  tree  structure  that  have  different 
data  rate  capacities;  (4)  this  switching  induces  some  transient 
InterSymbol  Interference  (ISI). 

III.  FH/S-CDMA  with  Wavelets 

The  described  WOFDM  scheme  can  be  employed  for  mul¬ 
tiple  access  communications  using  frequency  hopping,  where 
a  given  information  sequence  can  be  hopped  by  routing  the 
data  in  this  sequence  to  the  input  of  the  filter  generating  the 
desired  frequency  channel. 

The  key  features  of  this  scheme  are:  (1)  there  is  no  need 
for  a  programmable  frequency  synthesizer;  (2)  the  size  of  the 
hopping  BW  is  related  to  the  information  data  rate.  Changes 
in  this  data  rate  are  accommodated  by  routing  the  data  to  the 
appropriate  internal  nodes  of  the  tree  structure.  The  protocol 
for  how  the  variable  data  rate  is  to  be  accommodated  should 
be  established  from  the  outset  and  programmed  into  the  oper¬ 
ation  of  the  connection  network;  (3)  multicarrier  modulation 
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is  possible  with  the  proposed  technique.  Note  that  in  the  pro¬ 
posed  scheme  a  high  degree  of  security  may  be  afforded  to  the 
communication  system  using  a  relatively  small  number  of  or¬ 
thogonal  frequency  channels  due  to  the  combinatorial  power 
of  the  connection  network;  (4)  the  hopping  rate  relative  to 
the  data  rate  is  directly  controlled  by  the  rate  at  which  the 
connection  machine  changes  its  patterns  relative  to  the  maxi¬ 
mum  rate  each  channel  can  be  utilized;  (5)  aside  from  carrier 
synchronization  needed  to  perform  the  down  conversion,  clock 
synchronization  and  PN  code  synchronization  are  needed  for 
proper  operation. 

Direct  Sequence  (DS)  spectrum  spreading  could  be  incor¬ 
porated  into  the  design  by  forming  the  product  between  the 
spreading  code  and  the  information  sequence  prior  to  mod¬ 
ulating  the  wavelet  filters.  In  this  process  what  controls  the 
BW  of  each  hopping  channel  is  the  PN  code  rate  used  for  the 
DS  component. 

IV.  Low  Probability  of  Intercept 

We  previously  noted  that  the  switching  of  frequency  chan¬ 
nels  employed  in  order  to  accommodate  variations  in  source 
data  rate  causes  transient  ISI  [3].  This  transient  ISI  can 
be  used  to  introduce  a  novel  LPI  modulation  scheme.  More 
specifically,  suppose  a  given  frequency  band  spanned  by  a  shift 
orthogonal  function  is  channelized  in  a  variety  of  ways.  Each 
such  channelization  corresponds  to  a  different  distribution  of 
dimensions  in  the  time- frequency  plane.  A  modulator  can  be 
state  dependent  and  use  a  given  distribution  of  dimensions 
for  modulation  in  accordance  with  a  PN  code  known  to  the 
transmitter  and  receiver.  Suppose  the  modulator  state  varies 
rapidly  so  that  a  given  distribution  of  dimensions  is  not  used 
for  more  that  a  few  symboling  intervals.  An  unintended  re¬ 
ceiver  with  perfect  knowledge  of  the  waveforms  used  by  the 
transmitter  and  perfect  knowledge  of  symbol  timing  may  still 
be  unable  to  recover  the  symbols  since  it  perceives  a  sequence 
with  very  high  randomly  fluctuating  ISI  [3]. 

The  above  procedure  can  be  embedded  in  the  FH/S- 
CDMA,  and  the  multicarrier  modulation  scheme  proposed 
here,  and  two  PN  codes  could  be  used  by  each  information 
source,  one  controlling  the  operation  of  the  switching  network 
used  to  frequency  hop  the  spectrum  of  the  transmitted  signal, 
and  the  other  used  to  select  which  distribution  of  dimensions 
in  the  time-frequency  plane  is  to  be  used  by  the  modulator. 

References 

[1]  F.  Daneshgaran,  M.  Mondin,  “Wavelets  and  Scaling  Func¬ 
tions  as  Envelope  Waveforms  for  Modulation,”  IEEE-SP  Ini. 
Symp.  on  Time- Frequency  and  Time-Scale  Analysis,  Philadel¬ 
phia  (USA),  October  25-28,  1994. 

[2]  F.  Daneshgaran,  M.  Mondin,  “Orthogonal  Frequency  Divi¬ 
sion  Multiplexing  and  its  Application  to  Frequency- Hopped 
CDMA,”  Proc.  of  the  29th  CISS ,  Philadelphia,  MA  (USA), 
March  1995. 

[3]  F.  Daneshgaran,  M.  Mondin,  “Multidimensional  signaling  for 
bandlimited  channels,”  Proc.  of  ISIT  95,  Whistler,  B.C., 
(Canada),  September  17-22,  1995. 


30 


Near-Orthogonal  Coding  for  Spread  Spectrum  and  Error  Correction 

Karen  W.  Halford,  Yaron  Rozenbaum1,  and  Maite  Brandt-Pearce2 
Dept,  of  Electrical  Engineering,  Univ.  of  Virginia,  Charlottesville,  VA  22903 


Since  the  spreading  operation  and  error  correction  must 
share  the  bandwidth  available  in  a  CDMA  system,  it  is  appro¬ 
priate  to  approach  these  problems  jointly.  Several  papers  have 
addressed  this  problem  with  promising  results  [1,  2].  Hui  has 
shown  that  under  certain  assumptions,  the  system  performs 
better  when  more  bandwidth  is  devoted  to  error  correction 
[2].  Giallorenzi  [3]  shows  that  combining  error  correction  de¬ 
coding  and  multiuser  detection  significantly  improves  system 
performance.  We  extend  this  research  by  considering  not  only 
simultaneous  despreading  and  decoding ,  but  also  the  simulta¬ 
neous  encoding  and  spreading. 

We  consider  a  coded  asynchronous  CDMA  system  over  an 
AWGN  channel  with  constant  information  rate,  Ri.  Each 
user’s  transmission  rate  is  Rtx  =  Ri  •  Q  '  N  where  1/Q  is 
the  convolutional  code  rate  and  N  is  the  spreading  factor. 
The  receiver  matches  to  each  signature  sequence  and  performs 
maximum  likelihood  sequence  detection. 

For  fixed  Ri  and  Rtx  we  optimize  the  Asymptotic  Mul¬ 
tiuser  Coding  Gain  (AMCG)  with  respect  to  Q  and  N.  The 
AMCG  relates  the  energy  gain  for  high  SNR  to  the  sin¬ 
gle  user  uncoded  antipodal  system,  i.e.  g  in  the  expression 
Ve  =  Q{y/2Eb/N0g}  where  Q(-)  is  the  Marcum-Q  function, 
Eb  is  the  information  bit  energy,  and  N0  is  the  one  sided  noise 
density.  We  have  extended  this  measure  derived  in  [3]  which 
considers  Q  =  2  and  fixed  N  to  arbitrary  Q  and  N. 

The  probability  of  error  for  the  kth  user  can  be 
bounded  by  V{gk{e)~gk  ,min  }Q{y/(2Etk/No)tik(e)}  <  Ve(k) 

<  J2oec  'bk/No)Vk{e)}  where  C  is  the 

codebook  ,  e  is  any  valid  error  sequence  for  the  codeword 
D,  7?fc(e)  is  the  energy  gain  of  user  k  when  the  error  event 
e  occurs,  r}h,min  —  mine{gk(e)}  and  Ebk  is  the  energy  of 
user  k.  For  high  SNR,  gk,mm  will  dominate,  hence  it  is 
the  AMCG.  In  the  2  user  case  the  AMCG  is  bounded  by 

min  (^y^~E2~/~E\ ,  d f ,  ,df/Q^  <  gk,min  ^  Vk{^)  f°r  some 

valid  e  where  £i  and  E2  are  the  two  user’s  energies,  f  is 
the  sum  of  the  magnitude  of  the  two  partial  crosscorrela¬ 
tion,  df  is  the  free  distance  of  the  convolutional  code  and 
f(y/E/Eudf,S)  =  l/2[df{l  +  E2/E1)-2ZVE2/E1(df  +  l)]. 

These  bounds  for  two  users  are  computed  for  3  different 
length  M-sequences  in  Fig.l  (a)  and  (b).  Plots  (a)  and  (b) 
represent  Rtx  =  32 Ri  and  Rtx  =  64 Rj  respectively.  These 
bounds  were  computed  using  the  maximum  partial  crosscor¬ 
relations  over  all  delays  between  the  two  users.  Fig.l  (a)  shows 
that  when  the  partial  crosscorrelations  approach  1,  the  system 
with  the  lower  coding  rate  may  not  show  any  improvement. 
However,  when  the  crosscorrelations  are  high  but  less  than  1, 
as  in  Fig.l  (b),  the  lower  rate  codes  perform  as  well  as  the 
single  user  detector,  i.e.  ACMG=ACG=d//Q  for  all  E2/E\. 

Since  high  crosscorrelations  between  signature  sequences 
can  prevent  expected  coding  gains,  we  propose  spreading  and 
despreading  in  the  frequency  domain  which  was  considered 
for  optical  systems  in  [4].  Because  delays  appear  as  phase 
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Fig.  1:  In  (a),  (c),  and  (d)  for  Q=8,4,2:  df  =  21,  10,  and  5,  and 
N=4,  8  and  16,  and  for  (b)  df—  21,10,  and  5  and  N=8,16  and  32. 
The  crosscorrelations  £  for  Q=8,4,2  are  (a):  1.0,  0.71,  0.6,  (b):  0.71, 
0.6,  0.35,  (c):  1.0,  .43,  .35  and  (d):  0,0,0. 

factors  in  the  frequency  domain,  we  can  find  sequences  for  an 
asynchronous  system  that  are  both  short  and  have  sufficiently 
low  crosscorrelations  to  allow  coding  gains. 

In  this  system  the  encoder  multiplies  each  encoded  bit  by 
the  inverse  FFT  of  a  signature  sequence  that  has  low  crosscor¬ 
relation  in  the  frequency  domain.  The  decoder  matches  the 
Fourier  transform  of  each  received  symbol  to  the  signature 
sequence  and  sends  the  output  to  a  maximum  likelihood  se¬ 
quence  decoder.  In  Fig.l  (c)  and  (d),  we  show  the  bounds 
for  the  AMCG  for  two  frequency  domain  codes  with  con¬ 
stant  rates,  Rtx  =  32 Ri.  Fig.l  (c)  and  (d)  are  computed 
assuming  worst  case  interference  for  frequency  domain  M- 
sequences  and  Hadamard  codes,  respectively.  These  codes 
show  a  great  improvement  over  the  time  domain  codes,  and, 
in  fact,  the  Hadamard  sequence  achieves  the  single  user  ACG 
for  all  E2/E\.  Although  the  Hadamard  sequences  outper¬ 
form  the  M-sequences,  there  are  fewer  available  Hadamard 
sequences  for  a  given  sequence  length. 

The  asymptotic  multiuser  coding  gain  of  a  CDMA  system 
can  achieve  the  single  user  coding  gain  when  the  crosscorrela¬ 
tions  between  users  are  low.  However,  since  low  crosscorrela¬ 
tions  between  short  signature  sequences  are  difficult  to  obtain 
in  the  time  domain  in  an  asynchronous  system,  frequency  do¬ 
main  signature  sequences  are  a  viable  alternative. 
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Abstract  —  We  propose  an  analytical  method  to  up¬ 
per  bound  the  bit  error  probability  of  parallel  con¬ 
catenated  block  and  convolutional  codes. 


I.  Introduction 

The  so  called  turbo  codes  [1],  which  in  the  following  we  will 
call  parallel  concatenated  convolutional  codes  (PCCC),  con¬ 
sist  of  two  linear,  generally  simple  convolutional  codes  (the 
constituent  codes ,  CC)  linked  by  an  interleaver  as  shown  in 
Fig.  L  In  [1],  PCCC’s  with  appropriate  choices  of  the  CC’s 
and  of  the  interleaver  have  been  shown  to  yield  coding  gains 
close  to  those  predicted  by  the  Shannon  limit,  yet  keeping  the 
complexity  of  an  ”ad  hoc”  iterative  soft-decoding  procedure 
significantly  low  and  comparable  to  that  of  the  CC’s.  These 
results  have  been  further  reinforced  by  [2].  Despite  the  aston¬ 
ishing  performance  of  the  turbo  codes,  however,  neither  seri¬ 
ous  attempts  toward  a  theoretical  explanation  of  the  codes  be¬ 
havior/performance  nor  a  sufficient  comprehension  of  the  role 
and  relative  importance  of  the  ingredients  of  a  PCCC  have 
appeared  in  the  literature  so  far.  In  this  paper,  we  propose 
an  analytical  method  to  upper  bound  the  error  probability  of 
a  PCCC,  and  use  it  to  shed  light  on  important  issues  raised 
by  these  new  coding  schemes. 


II.  An  analytical  upper  bound  to  the  bit  error 
PROBABILITY  OF  PCCC’S 

Fig.  1  shows  clearly  the  discouraging  complexity  of  the  at¬ 
tempts  trying  to  obtain  the  weight  enumerating  function  of  a 
PCCC,  especially  when  the  length  A  of  the  interleaver  is  large 
(say  1000-10000)  as  it  should  be  to  yield  good  performance. 
The  only  viable  solution  to  the  problem  seems  to  pass  through 
an  appropriate  and  meaningful  way  of  making  independent 
the  weights  of  the  parity  checks  generated  by  the  first  and 
second  encoders.  To  this  end,  we  define  a  uniform  interleaver 
as  a  probabilistic  device  which  maps  a  given  input  informa¬ 
tion  sequence  of  length  A  and  weight  w  into  all  distinct  ^  j 

permutations  with  equal  probability  1/  ‘  ^se  °f  this  de¬ 

vice,  instead  of  the  actual  interleaver,  makes  the  weight  enu¬ 
merating  functions  A^l(Z)  and  A„2(Z)  of  the  parity  checks 
generated  by  the  two  encoders,  conditioned  to  a  given  weight 
w  of  the  input  sequence,  independent.  As  a  consequence,  the 
conditional  weight  enumerating  function  of  the  parity  check 
bits  of  the  whole  PCCC  A2p(Z)  can  be  easily  obtained  as 

Acp(7)  _  A^jZyA^jZ} 

W  {Z)~  (AT) 


and,  from  it,  an  upper  bound  to  the  bit  error  probability  can 
be  written  in  the  form 
N 


A(e)  ^  E  f WWAZr(z)\w= 


Z=e~RcEb/No 


^his  work  was  supported  by  European  Space  Agency  and 
by  CNR  under  Progetto  Finalizzato  Trasporti,  sub-project 
Prometheus. 


where  Rc  is  the  rate  of  the  PCCC.  Previous  results  refer 
to  an  ( A  —  L,  3 A)  block  code  equivalent  to  the  PCCC  and 
obtained  from  it  considering  input  information  sequences  of 
length  N  —  L  and  codewords  of  length  3 A,  where  L  is  the 
constraint  length  of  the  CC’s,  generated  by  terminating  trel¬ 
lises  of  the  two  CC’s.  Extensions  to  the  case  of  continuous 
PCCC  can  be  done  [3]. 

III.  The  role  of  interleaver  and  CC’s 

Use  of  the  uniform  interleaver  permits  a  separation  of  the 
effects  of  the  interleaver  length  and  of  the  CC’s  on  the  per¬ 
formance  of  the  PCCC.  Using  our  analytical  tools,  we  see 
that,  for  large  A  and  in  the  limits  of  the  validity  of  the  upper 
bounds,  the  interleaver  provides  an  interleaevr  gain  which  de¬ 
creases  the  bit  error  probability  by  a  factor  1/A.  Moreover, 
we  prove  that  this  gain  can  be  obtained  only  if  the  CC’s  are 
recursive  convolutional  codes,  and  that  this  is  due  to  the  par¬ 
ticular  weight  profile  of  them,  characterized  by  the  fact  that 
input  sequences  of  weight  w  —  1  do  not  produce  error  events 
of  finite  lengths.  Finally,  by  extensive  simulations,  we  validate 
the  upper  bounds  based  on  the  uniform  interleaver,  showing 
that  an  interleaver  chosen  as  a  random  permutation  is  likely 
to  yield  bit  error  probabilities  very  close  to  those  anticipated 
by  the  bounds. 

As  to  the  role  of  the  recursive  CC’s  (defined  by  the  generat¬ 
ing  function  (1,  n(D)/d(D)  for  the  case  of  rate  1/2  ),  we  have 
shown  that  a  reasonable  design  criterion  consists  in  choos¬ 
ing  the  polynomial  d(D)  defining  the  feedback  connections  as 
a  primitive  polynomial,  and  that  the  choice  of  the  numerator 
n(D)  should  aim  at  maximizing  the  weight  of  the  parity  checks 
for  input  information  sequences  of  minimum  weight  w  =  2. 
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I.  Introduction 

Several  parallel  concatenated  coding  schemes  (turbo  codes) 
based  on  multi-memory  (MM)  convolutional  codes  (more 
specifically,  a  (2,1,4, 7)  code)  were  recently  proposed  to 
achieve  near  Shannon-limit  error  correction  performance  with 
reasonable  decoding  complexity  [l]-[3].  On  the  other  hand, 
in  many  cases  of  interest,  unit-memory  (UM)  codes  have  been 
demonstrated  to  have  larger  free  distances  than  the  MM  codes 
with  the  same  rate  and  the  same  number  of  memory  elements 
[4].  In  this  paper,  new  turbo  codes  based  on  the  (8, 4,  3, 8)  UM 
Hamming  code  [4]  will  be  developed  and  shown  to  possess  bet¬ 
ter  performance  potential  in  some  senses.  The  standard  turbo 
decoding  algorithms,  however,  do  not  appear  to  achieve  this 
potential. 


II.  Encoder 

An  equivalent  systematic  recursive  generator  matrix  for  the 
UM  Hamming  code  can  be  obtained  by  first  properly  permut¬ 
ing  the  columns  and  then  multiplying  on  the  left  by  the  inverse 
of  the  left-most  4x4  sub-matrix  of  the  original  generator  ma¬ 
trix: 
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The  corresponding  encoder  can  be  implemented  with  three 
memory  elements.  The  encoder  for  the  UM  turbo  (UMT) 
code  is  similar  to  those  for  the  MMT  codes  [l]-[3],  except  that 
there  are  multiple  inputs  to  the  encoder  of  the  component 
codes.  The  trellis  is  terminated  using  the  method  of  [3].  Since 
the  systematic  bits  from  the  second  encoder  are  discarded,  the 
overall  code  rate  is  K/3(K  +  4),  where  K  is  the  interleaver 
size. 


III.  The  MAP  Algorithm  for  Multi-Input 

Recursive  Trellis  Codes 
In  this  section,  a  modified  MAP  algorithm  is  presented  to 
deal  with  multiple  inputs.  Let  the  state  of  the  encoder  for 
the  (n,fc,z/)  code  at  time  t  be  St  6  {0, 1, . . .  ,  2U  —  1},  for 
t  —  0, . . .  ,  L  =  K/k,  where  the  initial  and  final  states,  5o  and 
Sl,  are  known.  The  input  block  ut  =  (ut> i,...  >ut,k)  causes 
a  transition  from  St- 1  to  St,  and  the  corresponding  output 
codeword  xt  =  (xt,i,...  ,  xt,n)  is  observed  over  an  AWGN 
channel  as  yt  =  (yt, i,  . . .  ,  yt,n),  for  t  —  1, . . .  ,  L.  The  log  like¬ 
lihood  ratios  of  the  a  posteriori  probabilities  can  be  computed 
as: 


=log 


£s  ES'  7#  (*', g)  Qt-i  O')  Ms) 

£s  XL  %J (s',  s)  at-i(s')  pt(s) 
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where,  if  the  transition  s'  ->  s  is  allowed  by  input  utj  =  i, 


fort  : 

for  t 
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L-l,...  ,0 


7 t,j(s',s)  =  Pr  {ut,j  =  i}  Pr  {yt  |  St  =  s,utj  =  i,St- 1  =  s'} 

T t(s,  s)  =  ^  Pr  {ut  =  i}  Pr  {yt  |  St  =  s,  ut  =  i,  St- i  -  s'} 


IV.  Decoder  and  Performance 
The  decoder  structure  used  is  similar  to  that  in  [2]  except 
that  the  MAP  algorithm  in  III  is  applied  instead.  Numerical 
results  are  shown  in  Fig.  1  and  summarized  as  follows: 

•  The  minimum  distance  of  the  (60, 16)  UMT  code  with 
the  best  known  interleaver  is  14.  Maximum-likelihood 
decoding  simulation  of  this  code  shows  a  gain  of  0.5  dB 
over  the  (80, 16)  MMT  code  [3]  which  has  the  same  min¬ 
imum  distance.  The  use  of  turbo  decoding  introduces  a 
loss  of  about  1.5  dB. 

•  For  large  block  lengths,  simulation  results  show  that 
the  turbo  decoding  algorithm  converges  faster  than  that 
for  MMT  codes,  but  the  performance  is  not  as  good. 
Comparing  these  with  the  transfer  bounds  computed 
with  a  double  recursion  method  and  a  random  averaging 
argument  [5],  a  gap  of  coding  gain  with  turbo  decoding 
as  in  the  previous  case  can  be  observed  again. 
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Figure  1:  Performance  of  unit-memory  turbo  codes. 
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Abstract  -  In  this  paper  an  analytical  approach  to  newly  invented 
Turbo-Codes  (TC)  is  presented.  That  approach  is  based  on 
evaluating  the  properties  of  TC  by  means  of  the  Minimum 
Hamming  Distance  (MHD)  and  the  Hamming  Distance  Spectrum 
(HDS).  An  algorithm  for  computing  HDS  is  presented  and 
numerical  results  are  discussed.  The  concept  of  basic  return- 
to-zero  sequence  is  introduced.  It  is  shown  how  basic  return-to- 
zero  sequences  can  be  used  in  the  algorithm  for  computing 
HDS  and  how  it  can  justify  the  properties  of  TC.  Numerical 
results  of  computing  MHD  and  HDS  for  different  TCs  are 
presented  and  verified  by  simulations.  1 

I.  INTRODUCTION 

TC  seem  to  be  very  attractive  for  applications  in  practical 
communication  systems,  since  their  error  performance  is  close  to 
the  Shannon  limit  [1,  2].  During  the  last  two  years  some 
modifications  of  the  originally  proposed  parameters  of  both  the 
encoder  and  decoder  of  the  turbo-code,  have  been  proposed, 
which  lead  to  the  improvement  of  the  turbo-code  performance.  In 
most  cases  the  performance  of  turbo-codes  has  been  evaluated  by 
means  of  simulation.  In  this  paper  we  show  that  the  properties  of 
turbo-codes  can  be  predicted  by  means  of  MHD  and  HDS.  We 
describe  a  method  to  efficiently  calculate  the  MHD  and  HDS  of  the 
turbo-codes,  provided  that  the  interleaver  size  is  not  larger  than 
16x16.  This  procedure  is  a  modification  of  the  well  known  Fano- 
algorithm.  We  introduce  the  concept  of  basic  return-to-zero 
sequence.  We  shown  how  basic  return-to-zero  sequence  can  be 
used  in  the  Fano  algorithm  for  computing  HDS.  We  show  also  how 
the  properties  of  are  correlated  with  basic  return-to-zero  sequences 
can  justify  the  properties  of  TC. 

II.  DESCRIPTION  OF  THE  SYSTEM 
The  scheme  of  a  turbo-encoder  is  given  in  Fig.  1  [1].  Turbo-encoder 
consists  of  two  Recursive  Systematic  Coders  (RSC),  Interleaver  (I) 
and  puncturing  circuit.  Both  RSC  encoders  are  identical  rate-1/2 
convolutional  encoders.  In  our  study  we  have  considered  RSC 
encoders:  (23,35),  (7,5),  (5,7),  (15.17).  (5,7).  (1 ,1  )2.  The  puncturing 
pattern  used  by  us  is  following:  we  transmit  bit  YO  without  any 
change,  alternatively  every  second  bit  Y1.  Y2  is  punctured  Thus  the 
overall  rate  of  the  TC  is  1/2  and  the  transmitted  sequence  is:  YO,  Y1, 
YO,  Y2,  YO,  Y1  ...  . 

111.  AN  ALGORITHM  FOR  COMPUTING  HDS 
The  algorithm  used  by  us  for  computing  HDS  of  the  turbo-codes  is 
the  modified  Fano  algorithm.  The  modification  is  following:  we  use 
the  fact  that  in  order  for  a  turbo-coder  to  return  to  the  all-zero-state, 
both  RSC  encoders  must  come  to  the  zero  state.  So,  instead  of 
applying  to  the  input  of  the  turbo-code  arbitrary  binary  sequences, 
we  feed  it  only  with  some  selected  sequences  which  are  known  to 
force  RSC1  to  come  to  the  zero  state,  so  called  return-to-zero 
sequences.  Basic  return-to-zero  sequences  are  defined  as  those 
return-to-zero  sequences  which  are  not  a  linear  combination  of  other 
return-to-zero  sequences.  We  have  proven  that  for  any  recursive 
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2  Generating  polynomials  are  given  in  the  octal  notation. 


code  there  exists  only  one  basic  return-to-zero  sequence.  For 
example,  for  RSC  (5,7)  the  basic  return-to-zero  sequence  is  x=[1 01], 
for  RSC  (7,5)  it  is  equal  to  x=[1 1 1]. 


Fig,  1 .  The  scheme  of  the  turbo-encoder. 

IV.  BASIC  RETURN-TO-ZERO  SEQUENCE 
Basic  return-to-zero  sequence  can  always  help  in  rejecting  "bad" 
RSC  encoders.  We  have  shown  that  for  any  TC  (whatever  the  size 
or  kind  of  interleaving  is)  with  RSC  (5,7)  one  basic  return-to-zero 
sequence  can  drive  TC  to  the  all-zero-state.  For  such  TC  when  we 
use  the  puncturing  pattern  presented  in  Fig.  2  the  weight  of  an 
output  sequence  of  TC  is  always  equal  to  5.  Thus  for  that  particular 
RSC  code  and  puncturing  pattern  any  changes  in  the  size  of  the 
interleaver  or  introducing  non-uniformity  to  the  interleaver,  will  not 
lead  to  the  increase  of  MHD. 

V.  CONCLUSIONS 

We  have  computed  HDS  for  a  range  of  TC,  for  different  RSC  codes, 
different  interleavers  (sizes  up  to  16x16,  both  uniform  and  non- 
uniform).  We  have  also  verified  our  results  by  simulation. 
Conclusions  of  our  study  are  the  following: 

•  Simulation  results  show  that  analytical  approach  by  using 
MHD  and  HDS  can  be  used  for  evaluating  the  properties  of 
turbo-codes.  For  example  for  TC  with  RSC  (7,5)  and  the 
interleaver  1=8x8  the  difference  between  simulation  and 
analytical  results  for  BER=10'5  is  0.22  dB  for  uniform 
interleaving  and  0.7  dB  for  non-uniform  one. 

•  The  number  of  elements  in  the  HDS  which  must  be  taken  into 
account  does  not  exceed  3  (sometimes  one  spectrum  element 
is  sufficient).  For  example  for  TC  with  RSC  (7,5)  and  uniform 
interleaver  1=8x8  the  difference  between  BER  curves  for  1  and 
3  (or  more)  elements  is  0.4  dB  for  BER=10^  and  0  dB  for 
BER=10~6.  There  is  no  difference  in  BER  between  3  or  more 
elements. 

•  BER  of  the  turbo-code  can  be  increased  by: 

-  increasing  the  constraint  length  of  the  RSC  code.  For 
example  for  BER=10"6  and  uniform  interleaver  1=8x8  the  TC 
with  RSC  (23,35)  is  better  than  TC  with  RSC  (15,17)  by  3.2 
dB,  and  outperforms  TC  with  RSC  (1,1)  by  about  5.8  dB. 

-  increasing  the  size  of  the  interleaver.  For  example  for  TC 
with  RSC  (7,5)  for  BER=10'6  TC  with  1=8x16  outperforms  TC 
with  1=8x8  by  2.4  dB  and  TC  with  no  interleaving  by  1.6  dB. 

-  introducing  non-uniformity  to  the  interleaver , 

•  For  any  Recursive  Code  there  exists  only  one  basic  return-to- 
zero  sequence.  By  analyzing  the  properties  of  basic  return-to- 
zero  sequence  we  may  find  "bad"  codes.  The  problem  which  is 
still  open  is  how  basic  return-to-zero  sequence  can  be  used 
for  designing  TC  which  would  possess  very  good  properties 
(i.e.  large  MHD  value). 
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Abstract—  We  develop  b/n  multiple  turbo  codes  and 
an  iterative  turbo  decoding  scheme  based  on  an  ap¬ 
proximation  to  the  optimum  bit  decision  rule  (MAP). 
For  random  interleaver  size  of  16384  bits,  a  bit  error 
probability  of  10”5  at  a  required  Eb/N0  of  about  0.8  dB 
from  the  binary-input  channel  capacity  for  rate  b/n 
was  obtained  for  various  turbo  codes.  Examples  are 
given  for  rate  b/n  =1/2,  1/3,  1/4  and  2/6  turbo  codes 
using  component  codes  with  up  to  16  states. 


I.  Introduction 

Turbo  codes  were  recently  proposed  by  Berrou,  Glavieux  and  Thiti- 
majshima  [1],  We  propose  rate  b/n  codes  that  consist  of  the  parallel 
concatenation  of  q  systematic  recursive  convolutional  codes,  with 
random  interleavers  of  size  N  between  rate  b/nx,  encoders,  such  that 
n  =  jyi=i  n<-  Fnc°ding  and  decoding  is  done  block  by  block.  En¬ 
coders  are  forced  to  the  all-zero  state  at  the  end  of  each  block  by  a 
simple  termination  method  [4]. 


II.  Turbo  Decoding  for  Multiple  Codes 

Let  Uk  be  a  binary  random  variable  taking  values  in  {0,  1 } ,  representing 
the  sequence  of  information  bits  u  =  (tq, . . . ,  u^b).  This  sequence 
is  partitioned  into  N  groups  of  b  bits  representing  input  symbols. 
Bit-by-bit,  rather  than  symbol-by-symbol,  interleaving  is  performed. 

The  modified  MAP  algorithm  [5]  provides  the  log  likelihood  ratio 
Lk  =  log  p^Iojyj  gi^n  the  received  symbols  y,  where 


Lk 
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Consider  the  parallel  concatenation  of  q  codes.  The  combination  of 
permuter  and  systematic  recursive  convolutional  code  is  considered 
as  a  block  code  with  input  u  and  output  x7-,  j  =  1,2 The 
components  of  xy  may  be  binary  or  non-binary.  For  the  non-binary 
case  multilevel  modulation  is  used,  resulting  in  turbo  trellis  coded 
modulation  (TTCM).  The  corresponding  received  sequences  are  y y. 

The  optimum  bit  decision  rule  (MAP)  for  data  with  uniform  prob¬ 
abilities  is 

Eu.^  FU  p(y»  n 
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An  approximation  to  P(y;  |u)  was  used  in  [4]  to  obtain  (2)  as  L):  — 
ELU.  where  Ljk  s  are  iterative  solutions  to  a  set  of  non-linear 
equations  that  can  be  efficiently  computed  using  the  MAP  algorithm 
with  pre-interleaving  and  post-deinterleaving  as 
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for  k  =  1 , 2 . Nb  and  j  =  1.  2, . . . ,  q.  Then  Ljk  =  limm_M 

All  initial  conditions  are  set  to  zero,  i.e.  L®  =  0. 

JThe  research  described  in  this  paper  was  performed  at  the  Jet  Propulsion 
Laboratory,  California  Institute  of  Technology,  under  contract  with  the  National 
Aeronautics  and  Space  Administration. 


III.  Performance 

The  bit  error  rate  performance  of  these  codes  was  evaluated  by  using 
transfer  function  bounds  [3]  [2],  In  [2]  it  was  shown  that  transfer 
function  bounds  are  very  useful  for  signal-to-noise  ratios  above  the 
cutoff  rate  threshold  and  that  they  cannot  accurately  predict  perfor¬ 
mance  in  the  region  between  cutoff  rate  and  capacity.  In  this  region, 
the  performance  was  computed  by  simulation. 

The  figure  below  shows  the  performance  of  turbo  codes  with 
the  following  generators:  For  two  K  —  5  constituent  codes, 
(1  ,  gb/ga,  gc/ga)  and  (gb/ga),  with  ga  =  (37 )octah  gb  -  (33 ) octal 
and  gc  =  (25 )octal;  For  three  K  =  3  codes,  (1,  gb/ga)  and  (gb/ga) 
with  ga  =  (J) octai  and  gb  =  (5)oc,a/;  For  three  K  =  4  codes,  (1,  gb/ga) 
and  (gb/ga)  with  ga  -  (13 )octai  and  gb  =  (1 1  )octab 

Further  results  at  BER=1CL5  were  obtained  for  two  constituent 
codes  with  interleaving  size  N  =  16384  as  follows.  For  a  rate  1/2 
turbo  code  using  two  codes,  K  —  2  (differential  encoder)  with  ( gb / ga ) 
where  ga  —  (3 )octai  and  gb  =  (l)0ctah  and  K  =  5  with  ( gb/ga )  where 
ga  =  (23 )octai  and  gb  =  (33  W  the  required  bit  SNR  was  0.85  dB. 
For  rate  1/3,  we  used  two  K  =  5  codes,  (1,  gb/ga)  and  ( gb/ga )  with 
ga  =  (23 )octai  and  gb  =  (33 )octal  and  obtained  bit  SNR=  0.25  dB. 
For  rate  1/4,  we  used  two  K  -  5  codes  with  (1,  gb/ga*  gc/ga)  and 
(gb/ga)  with  ga  =  (23 ) octal,  gb  -  (33 )octal  and  gc  =  (25)oc/a/  and 
obtained  bit  SNR  =  0  dB.  For  a  rate  2/6  turbo  code  each  constituent 
code  is  constructed  from  two  parallel  K  =  3  codes  (1,  gb\/ga,  gci/ga) 
and  (1,  gbi/ga,  gci/ga)  where  the  output  of  gb\/ga  is  added  to  the 
output  of  gbi/ga  and  the  output  of  gc\/ga  is  added  to  the  output  of 

ga/ga •  ga  =  (7 )octah  gbl  —  (6)octah  gel  =  Woctah  gbl  =  (7 ) octal, 

gc2  =  (4 ) octal.  The  resulting  code  has  i6  states  with  two  inputs  and 
four  outputs,  The  second  code  is  identical  to  the  first  one  but  not  using 
the  systematic  bits.  BER=10~5  was  obtained  at  bit  SNR=0.2  dB. 
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Abstract  -  The  performance  of  the  ’turbo’  coding  scheme  is 
measured  and  an  error  floor  is  discovered.  These  residual  errors 
are  corrected  with  an  outer  BCH  code.  The  complexity  of  the 
system  is  discussed,  and  for  low  data  rates  a  realizable  system 
operating  at  Eb/N0  below  0.2  dB  is  presented. 

I.  Introduction 

Recently  it  has  been  discovered  that  a  very  good  performance  can  be 
achieved  with  iterative  decoding  of  a  parallel  concatenation  of  small 
convolutional  codes  [1].  This  coding  scheme  is  named  ’turbo'  coding. 
The  basic  idea  is  to  encode  the  information  sequence  twice,  the  second 
time  after  a  pseudo-random  interleaver,  and  to  do  iterative  decoding 
on  the  two  encoded  sequences  in  two  decoders.  The  system  can  be 
regarded  as  a  kind  of  product  code.  Due  to  the  information  exchange 
among  the  two  decoders  the  decoding  algorithm  must  provide  soft 
output.  We  use  the  MAP  algorithm  [2]  which  actually  calculates  the 
a  posteriori  probability  of  each  information  bit.  The  convolutional 
codes  are  used  in  a  recursive  systematic  form  since  it  gives  an  im¬ 
proved  performance  with  this  system. 

II.  The  Error  Floor 

The  first  simulations  were  based  on  the  recursive  systematic  code 
(l,l-fD4/l+D+D2+D3+D4).  We  use  the  same  code  for  both 
encoders  but  for  the  second  one  the  information  sequence  is  not 
transmitted.  This  gives  an  overall  rate  of  1/3.  We  use  a  block  length 
of  10384  information  bits.  For  all  simulations  presented  in  this  paper 
all  numbers  including  the  channel  input  are  represented  as  floating 
point  values. 

As  seen  from  Figure  1 ,  the  results  achieved  with  this  system  are  very 
promising  since  the  Bit  Error  Rate  (BER)  after  18  iterations  is  close 
to  10  already  at  0.2  dB.  Unfortunately  the  BER  decreases  very 
slowly  for  improved  SNR.  What  we  see  are  many  frames  with  only 
a  few  bit  errors.  This  is  due  to  the  low  free  distance  of  this  coding 
scheme.  The  free  distance  of  this  system  might  be  as  low  as  10.  The 
actual  profile  depends  on  the  specific  interleaver. 

A  search  for  better  interleavers  might  give  improved  performance. 
However,  the  main  problem  is  combinations  of  two  low  weight  words 
for  the  basic  code.  Consequently  the  performance  with  interleaver 
structures  like  block  interleavers  is  quite  poor,  and  a  search  among 
the  random  interleavers  can  only  remove  a  couple  of  the  worst  low 
weight  patterns. 

III.  The  Extended  ’turbo'  Coding  Scheme 

An  obvious  way  to  remove  the  error  floor  (or  saddle)  is  to  use  an  outer 
code.  Since  the  bursts  consist  of  very  few  bit  errors,  we  will  use  a 
(10384,10000)  BCH  code  capable  of  correcting  24  errors.  This  outer 
code  corrects  all  the  residual  errors,  but  we  loose  0.16  dB  due  to  the 
decreased  rate.  With  this  system  the  Probability  of  Frame  Loss  (PFL) 
is  below  10“4  at  0.4  dB. 


Improved  performance  can  be  achieved  with  a  system  based  on  rate 
1/3  codes  with  only  8  states.  This  gives  rate  1/5  for  the  'turbo'  coding 
scheme.  In  this  case  we  have  also  used  the  outer  BCH  code. 

With  this  system  we  have  simulated  25,000  frames  without  frameloss 
at  0.1  dB.  This  means  that  the  90%  confidence  level  for  the  PFL  is 
below  10“4.  The  BER  is  shown  in  Figure  1. 

IV.  Complexity 

The  performance  must  of  course  be  compared  to  the  complexity.  We 
have  estimated  the  number  of  operations  needed  in  the  MAP  algorithm 
for  recursive  systematic  codes  and  conclude  that  this  is  about  4  times 
the  number  of  operations  in  a  Viterbi  decoder.  This  means  that  the 
number  of  operations  for  18  iterations  with  M— 3  codes  is  in  the  order 
of  212.  We  believe  that  with  a  logarithm  quantization  an  8  bit  represen¬ 
tation  is  sufficient  for  the  internal  representation  in  the  MAP  decoder. 
With  this  quantization  and  channel  input  quantized  in  16  levels  we 
expect  a  performance  degradation  about  0.1  dB. 

For  low  data  rates  the  ’turbo’  coding  scheme  can  be  implemented  with 
only  one  MAP  decoder  (used  2x18  times),  and  the  decoder  for  the 
BCH  code  can  be  implemented  on  a  DSP.  Further  the  calculations 
inside  the  MAP  decoder  can  be  serialized,  using  the  same  hardware 
for  each  state. 

This  means  that  for  data  rates  below  100  kbit/s  the  complexity  of  this 
system  is  moderate,  and  the  extended  ’turbo’  coding  scheme  might 
be  an  alternative  to  ordinary  concatenated  systems. 

References. 

[1]  C.  Berrou,  A.  Glavieux  and  P.  Thitimajshima,  "Near  Shannon 
Limit  Error-correcting  Coding  and  Decoding:  Turbo-codes(l)", 
Proc.  ICC  ' 93 ,  May  1993,  pp.  1064-1070. 

[2]  L.  R.  Bahl,  J.  Cocke,  F.  Jelinek  and  J.  Raviv,  "Optimal  Decoding 
of  Linear  Codes  for  Minimizing  Symbol  Error  Rate",  IEEE 
Transactions  on  Information  Theory,  Vol.  IT-20,  March  1974,  pp. 
284-287. 


BER  (No  Quantization) 


'e~M=4,  overall-rate=1/3,  18  Iterations 
~"~M=4,  overall-rate=1/3,  BCH,  18  Iterations 
~*"M=3,  overall-rate=1/5,  BCH,  18  Iterations 


36 


Interleaver  Design  for  Three  Dimensional  Turbo  Codes 


Adrian  S.  Barbulescu  and  Slcvcn  S.  Pictrobon 

Satellite  Communications  Research  Centre,  University  ol  South  Australia,  The  Levels  SA  5095,  Australia 


Abstract  —  A  new  bandwidth  efficient  interleaver  is  de¬ 
scribed  for  turbo  codes  when  used  to  decode  short  frames 
of  data  using  the  MAP  algorithm.  Applications  in  rate 
compatible  turbo  codes  and  encryption  are  presented. 

I.  Introduction 

It  is  well  known  that  the  interleaver  design  is  the  key  to 
achieve  the  best  performance  for  turbo  codes  [1 1  .  For  very 
large  frame  sizes,  random  interleavers  are  near  optimum.  Foi 
small  frame  sizes  -  for  which  the  interleaver  depth  is  less  than 
ten  times  the  constraint  length  of  the  component  convolu¬ 
tional  code  -  a  random  interleaver  is  not  the  best  choice.  In  the 
following,  we  consider  a  three  dimensional  turbo-code 
(3D-TQ  shown  in  Figure  1  which  has  the  feedback  poly¬ 
nomial  equal  to  all  ones. 

II.  Design  Criteria 

In  order  to  use  a  maximum  a  posteriori  (MAP)  decoding 
algorithm  [2],  the  initial  and  the  final  state  ol  each  one  dimen¬ 
sional  encoder  should  be  fixed  for  all  three  coded  sequences. 
This  could  be  achieved  by  appending  three  different  Mails  , 
one  for  each  coded  sequence  which  will  reduce  the  bit  rate.  A 
new  interleaver  type  called  a  “simile”  interleaver  was  de¬ 
scribed  in  [3]  fora  two-dimensional  turbo-code  which  needs 
only  one  “tail”  to  be  appended.  A  similar  method  will  be  used 
to  create  a  “simile”  interleaver  for  a  3D-TC. 

We  denote  v  the  encoder  memory  size  of  each  one  dimen¬ 
sional  encoder.  We  can  rearrange  the  whole  block  of  N  in¬ 
formation  bits  in  mod  (v  +  1)  sequences.  The  important  ad¬ 
vantage  in  doing  this  is  that  from  the  point  of  view  of  the  final 
encoder  state,  the  order  of  the  individual  bits  in  each  sequence 
does  not  matter  as  long  as  they  belong  to  the  same  sequence. 
The  ‘‘simile'’  interleaver  has  to  perform  the  interleaving  of  the 
bits  within  each  particular  sequence  in  order  to  drive  the  en¬ 
coder  into  the  same  state  as  without  interleaving.  In  [3j  we  de¬ 
scribed  a  particular  block  helical  interleaver.  This  can  be  ex¬ 
tended  to  3D-TC  by  assuming  that  the  number  of  columns  is 
a  multiple  of  (v  +  1).  The  information  sequence  is  stored  row¬ 
wise  and  the  two  interleaved  sequences  start  from  the  lei t 
corners:  bottom  left  corner  and  up  the  diagonal  for  interleaver 
Ia  and  top  left  corner  and  down  the  diagonal  for  interleaver  Ih. 

A  second  criteria  is  needed  if  the  coded  bits  are  punctured: 
each  information  bit  should  have  associated  with  it,  alter 
puncturing,  one  and  only  one  coded  bit.  In  this  way  the  correc¬ 
tion  capability  of  the  code  is  uniformly  distributed  over  all  in¬ 
formation  bits.  This  type  of  interleaver  was  introduced  in  [4] 
for  a  two  dimensional  turbo-code  and  was  called  an  “odd- 
even”  type  of  interleaver. 

Using  a  block  helical  interleaver,  if  the  number  of  columns 
is  a  multiple  of  the  dimension  order,  which  is  3  for  3D-TC,  we 
can  multiplex  the  coded  bits  ol  the  straight  sequence  whose 
index  in  time  modulo  3  is  zero  with  the  interleaved  Ia  coded 
bits  whose  index  in  time  modulo  3  is  one  and  with  the  inter- 
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leaved  Ib  coded  bits  whose  index  in  time  modulo  3  is  two.  In 
this  way  all  information  bits  have  associated  with  them  one 
and  only  one  coded  bit. 

III.  Applications 

The  coding  gain  can  be  varied  without  changing  the  con¬ 
volutional  code.  In  a  good  channel  a  rate  half  turbo  code  com¬ 
posed  of  the  uncoded  sequence  fx}  and  the  punctured  se¬ 
quence  (y/ya(  can  be  used.  If  the  channel  becomes  noisier  a 
rate  third  code  can  be  obtained  by  transmitting  {x},  {y }  and 
{ ya }  sequences.  It  was  shown  in  [5]  that  the  probability  of 
error  is  proportional  with  N1.  For  an  even  worse  channel  a 
rale  quarter  code  can  be  used  by  transmitting  the  {yb}  se¬ 
quence  which  would  make  the  probability  of  error  propor¬ 
tional  with  N" 2  and  so  on.  As  in  the  case  of  rate  compatible 
convolutional  codes,  the  same  turbo  decoder  can  be  used  in  all 
cases. 

In  Figure  1  we  use  the  sixteen  state  turbo  code  (v=4)  [1]. 
Each  interleaver  is  made  from  five  pseudo  random  inter¬ 
leavers  with  different  generator  polynomials  which  can  start 
from  different  states  produced  by  a  long  pseudo  random  gen¬ 
erator.  The  outputs  of  the  turbo  encoder  are  buried  in  noise 
whose  variance  is  known  and  can  be  changed  each  frame  or 
even  in  each  interleaved  sequence.  The  long  pseudo  random 
generator  which  generates  the  starting  states  of  the  inter¬ 
leavers  together  with  the  variance  of  the  noise  are  the  keys  to 
the  proposed  encryption  system.  We  assume  these  keys  to  be 
secret  and  known  at  the  receiver  end.  This  principle  is  similar 
with  that  for  CDMA  transmissions. 


Figure  1.  Three  dimensional  turbo  encoder 
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Abstract  —  An  optimal  interleaving  between  two 
component  encoders  of  a  turbo-code  is  proposed.  For 
any  real  constructable  interleaver  the  optimality  crite¬ 
rion  is  given.  For  component  codes  (CC)  with  known 
weight  distribution  (WD)  the  WD  of  the  turbo-code 
with  perfect  interleaving  is  calculated.  As  CC’s  the 
random  codes  and  terminated  convolutional  codes 
are  considered.  It  is  shown  that  the  often  observed 
“break”  in  the  performance  curves  for  turbo-codes  is 
a  result  of  their  “broken”  WD. 

I.  Introduction 

Any  codeword  of  the  recently  introduced  turbo-codes  [1]  has 
the  following  structure:  [J|TA|J'A],  where  X  is  the  fc-tuple 
of  the  information  bits,  A  is  the  k  x  r  binary  matrix,  and  J' 
is  a  version  of  X  with  interleaved  (permutated)  coordinates. 
As  CC’s  both  systematic  block  codes  and  convolutional  codes 
with  terminated  encoders  have  been  in  use  until  now.  The 
rate  of  the  whole  code  in  both  cases  is  R  =  k/(k  +  2r).  The 
linearity  of  turbo-codes  is  shown  in  [2]. 

II.  Optimal  Interleaving  and  WD  of  Whole 
Code  with  Known  WD’s  of  Component  Codes 

Dispose  all  2k  —  1  nonzero  codewords  of  one  CC  into  k  groups 
so  that  each  z'th  (z  =  1,  k)  group  consists  of  codewords  of 
weight  i  in  the  information  part.  Note,  that  if  the  information 
vector  X  belongs  to  the  ith  group,  then  the  permutated  vector 
X'  will  be  in  this  group  too. 

The  aim  of  interleaving  is  to  produce  (by  manipulating  the 
weights  of  the  second  redundancy  part)  the  whole  codewords 
with  the  overall  weights  as  large  as  possible.  It  means  that 
within  each  group  the  first  redundancy  part  with  small  weight 
should  be  associated  after  interleaving  with  a  second  redun¬ 
dancy  part  with  large  weight  and  vice  versa. 

Let  the  WD  of  CC  be  known  in  the  form  A(i,j),  which 
denotes  the  number  of  codewords  with  Hamming  weight  i  of 
the  information  bits  and  weight  j  of  the  redundancy  bits.  Wi¬ 
thin  each  group  dispose  the  codewords  with  non-decreasing 
weights  of  the  redundancy  part  so  that  for  any  l  holds: 
j(i,  J  +  1)  >  j(i,  l),  where  j(i,  l),  l  =  1,  (J),  is  the  weight 
of  the  redundancy  part  of  the  Ith.  codeword  in  the  disposed 
zth  group.  Note,  that  for  any  i  and  l  the  numbers  j(i,l)  are 
determined  by  A(i,j).  The  Ith  codeword  of  the  turbo-code  in 
this  group  has  then  weight 

=  z  +  +  (1) 

Counting  all  codewords,  from  (1)  we  immediately  obtain  the 
WD  of  the  turbo-code  in  the  form  A(i,j ),  which  yields  also 
the  number  A(w)  of  codewords  with  weight  w. 

An  interleaving,  which  leads  to  the  same  WD  of  the  turbo¬ 
code  as  can  be  obtained  from  (1),  will  be  called  a  fully  optimal 
interleaving  (f.o.i.). 

Viewing  W (z,  /)  for  each  i  as  a  random  variable  of  /,  the 
criterion  for  the  optimal  interleaving  can  be  formulated  as  a 
problem  of  minimizing  its  variance:  cr?{W(i,  /)}  — +■  min. 


III.  Turbo-Codes  with  Random  CC’s 
The  random  ( k  +  r,k)  code  has  the  WD  A(w)  =  (**r)/2r, 
which  is  obtained  from  the  equation  between  the  probability 
of  occurrence  of  ( k  +  r)-tuple  and  of  codeword  both  of  weight 
w.  However,  for  applying  (1)  the  WD  in  the  form  A(i,j)  is 
required.  From  a  similar  equation  for  each  group  we  get: 

^)=(-)(;)^  (2) 
Because  of  Vandermonde  convolution:  ^  (^)  (r)  =  (^r), 

i  +  j  =  w 

the  code  with  WD  (2)  is  a  random  code  too. 

Combining  (2)  and  (1),  we  see  that  for  each  group  i  (due  to 
the  symmetry  (j)  =  (rC))  each  parity-weight  j  is  associated 
after  f.o.i.  with  a  second  parity-weight  r  —  j.  Thus,  Vi,  l  : 
W(i,l)  =  i  -f  r.  Furthermore,  A(i,  r)  —  A(i,j  ^  r)  =  0 
and  the  WD  of  the  whole  code  is:  A(0)  =  1,  A(w)  =  r)  for 
r  <  w  <  k  +  r ,  and  A{w)  =  0  otherwise.  Hence,  the  minimum 
distance  is  r  +  1,  which  increases  with  increasing  k. 

IV.  Convolutional  Codes  as  CC’s 

WD  of  these  terminated  codes  for  great  k  and  rate  R  —  1/2 
can  be  written  as  A(i,  j)/(k)  =  1  -  Pi,t)r~J ,  where 

pi,t  =  (1  —  (1  —  2i/k)J^)/2,  for  feed-back  encoders  J(t)  is  a 
linear  function  of  the  time  t  =  l,k  and  for  feed-forward  en¬ 
coders  J(t)  it  is  a  constant  J  equal  to  the  number  of  nonzero 
terms  in  the  generator  polynomial  ( pi}t  =  pt  in  this  case).  Ac¬ 
cording  to  the  DeMoivre-Laplace  theorem  the  right-hand  side 
of  the  last  WD  can  be  approximated  by  a  Gaussian  distribu- 
tion:  A{i,j)/Cl)  a  exp  (-(j  -  ^i)2/(2<r?))  /  (ffivTar),  with 
mean  pt  =  rpi  and  variance  of  =  rpt(l  —  pt),  where  for  feed¬ 
back  encoders  pi  is  the  time  average  of  pl)t. 

Due  to  the  symmetry  of  the  Gaussian  distribution  around 
its  mean,  one  sees  that  after  applying  the  f.o.i.  rule  (1)  all  co¬ 
dewords  of  the  turbo-code  within  the  z’th  group  have  weight 
W(i,  l)  «  i  -f  2pt,  while  the  total  number  of  codewords  in  this 
group  is  (*) .  In  case  of  feed-forward  encoders  W(i,  l)  %  z+2  Ji 
for  small  and  large  i  and  W(i,  l)  «  z'+r  for  i  near  to  k/2  (which 
corresponds  to  random  codes).  Hence,  the  minimum  distance 
is  l-j-2  J.  For  feed-back  encoders  the  minimum  distance  incre¬ 
ases  with  increasing  k  and  W(i,  l)  «  z  +  r  for  all  z  except  very 
small  ones.  Codes  with  these  encoders  are  thus  near  to  ran¬ 
dom  codes.  The  great  distinction  between  values  W (i ,  l)  and 
between  number  of  codewords  for  small  and  central  i  results 
into  a  “break”  in  the  performance  curves. 

Using  the  proposed  WD’s,  one  can  obtain  the  bounds  on 
error  rate  for  turbo-codes  like  union  bounds  in  [2]. 
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Abstract  —  The  idea  of  iterative  decoding  of  two- 
dimensional  systematic  convolutional  codes  —  so- 
called  turbo-codes  —  is  extended  to  threshold  deco¬ 
ding,  which  is  presented  in  “Soft-In/ Soft-Out”  form. 
The  computational  complexity  of  the  proposed  deco¬ 
der  is  very  low.  Surprisingly  good  simulation  results 
are  shown  for  the  Gaussian  channel. 


I.  Preliminaries 

We  restrict  ourselves  to  binary  data.  A  convolutional  en¬ 
coder  with  rate  Rc  —  &/(&  +  !)  produces  the  output  bits 


-(!) 


r(*+i) 


at  time  u  =  0, 1,  2,  —  During  the  transmissi- 


corrupts  the  coded  bits. 


on  the  noise  sequence  elu 
This  sequence  is  statistically  independent  from  digit  to  digit. 
Thus,  we  receive  the  sequence  Xu ^  0 £u  \  1  <  i  <  k  1, 

where  0  denotes  the  modulo-two  addition.  We  assume  that 
an  error  has  occurred,  if  e ^  =  1,  and  =  0  otherwise. 

For  threshold  decoding  it  is  important  to  provide  informati¬ 
on  about  the  error  symbol  CuK  The  a  posteriori  log-likelihood 
ratio  for  this  symbol  can  be  calculated  as  L(e^\yu^)  = 

ln  p^ey|=°l4t^  =  4  jf-a  •  +  Z(e(ul)),  where  is  the  mat- 

ched  filter  output  associated  with  the  binary  value  Es/No 
is  the  signal-to-noise  ratio,  a  is  the  fading  amplitude,  and 
L(eu^)  is  the  a  priori  log-likelihood  ratio  for  symbol 

Following  [l],  we  shall  use  a  special  operation  S3,  which 
denotes  L(v i)  BB  L{y 2)  =  L(v  1  ©  V2)  for  log-likelihood  ratios  of 
statistically  independent  binary  random  variables  vi  and  v 2. 


II.  Soft-In/Soft-Out  Threshold  Decoding 

Soft-In  threshold  decoding  is  well-known  as  A  Posteriori  Pro¬ 
bability  (APP)  decoding  [2].  The  objective  of  Massey’s  deco¬ 
der  is  to  maximize  the  probability  P(e ^  =  £|{A^})  that  the 
error  symbol  1  <  i  <  k ,  has  a  certain  value  £  G  {0, 1} 
under  the  condition  that  we  have  a  set  {A^},  1  <  j  <  *7, 
of  parity  checks  orthogonal  on  e^.  Each  parity  check  A^ 
can  be  calculated  as  modulo-two  sum  of  e^,  a  special  selec¬ 
tion  of  error  symbols  e^,  1  <  a  <  k,  s  €  Sjt,a\  associated 
with  the  information  bits  x[a\  and  the  error  symbols  e[k+1\ 
s'  G  c  ULi  associated  with  the  parity  check 

bits  The  sets  S^t,a^  and  consisting  of  integers 

are  depending  on  the  generator  polynomials  of  the  code.  The 
soft  output  of  the  decoder  can  be  written  as 

f)w(})  +  4E±a-\y^\+L(e^), 

s — 2-v - '  s 

v  .  .  a  prion 

channel 

.  extrinsic 

where 


j=i 


k 


a=1  ses{',a)  s'es^k+l) 

The  ffl  operation  can  be  approximated  by  sign  and  minimum 
operations.  The  value  1  —  2A^  is  equal  to  +1  or  —1.  Thus, 
we  need  only  compare  operations  and  additions  to  calculate 
the  extrinsic  term. 


III.  Iterative  (“Turbo”)  Decoding 

We  can  split  the  soft  output  into  three  terms,  namely  into 
the  so-called  extrinsic  information  representing  the  influence 
of  the  error  bits  orthogonal  on  the  current  bit  Cq\  the  soft 
output  of  the  channel,  and  the  a  priori  value  L(e^).  If  a 
priori  information  about  the  error  bits  is  available,  it  is  also 
used  in  calculating  the  weights  Only  the  extrinsic  value 

(the  information  produced  by  the  previous  decoder)  should 
be  passed  on  as  new  a  priori  value  to  the  next  decoder.  The 
structure  of  the  codec  (with  a  random  interleaver  between  two 
encoders  for  self-orthogonal  codes)  corresponds  to  [3]. 

IV.  Simulation  Results 

The  plots  in  the  Figure  show  the  achieved  bit  error  rates  using 
up  to  20  iterations  over  a  Gaussian  channel  (code  rate  «  1/2, 
length  of  interleaver  9990,  two  component  codes  with  J  =  3). 
16  ffl  operations  and  8  additions  are  needed  per  information 
bit  and  iteration  for  calculating  the  soft  output. 


The  “break”  in  the  curves  after  enough  iterations  is  the  re¬ 
sult  of  the  weight  distribution  of  the  used  feed-forward  com¬ 
ponent  codes  [4]. 
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Abstract —  The  Efficient  Reservation  Virtual  Circuit  (or 

ERVC)  protocol  is  a  novel  connection  control  protocol 
designed  for  constant-rate  delay-insensitive  traffic  in 
gigabit  networks.  In  the  ERVC  protocol,  session  du¬ 
rations  are  recorded  and  capacity  is  reserved  only  for 
the  duration  of  the  session,  starting  at  the  time  it  is 
actually  needed.  The  protocol  also  has  the  “reserva¬ 
tion  ahead’’  feature,  which  allows  a  node  to  calculate 
the  time  at  which  the  requested  capacity  will  be  avail¬ 
able  and  reserve  it  in  advance,  thus  avoiding  wasteful 
repetition  of  the  call  setup  phase.  In  addition,  the 
protocol  is  robust  to  link  and  node  failures,  and  al¬ 
lows  soft  recovery  from  processor  failures. 

L  Introduction 

The  ERVC  protocol  is  one  of  the  two  candidate  protocols  that 
we  are  considering  for  implementation  in  the  40  Gbit/s  ATM- 
based  fiber-optic  Thunder  and  Lightning  network  currently 
being  developed  at  UCSB.  In  designing  the  connection  and 
flow  control  algorithms  for  this  network  our  objectives  were 
to  ensure  lossless  transmission,  efficient  utilization  of  capac¬ 
ity,  minimum  pre-transmission  delay  for  delay-sensitive  traffic, 
and  packet  arrival  in  correct  order.  To  meet  these  objectives, 
we  have  proposed  the  ERVC  protocol  for  constant-rate  traf¬ 
fic,  and  the  Ready-to-Go  Virtual  Circuit  (or  RGVC  )  proto¬ 
col  for  best-effort  traffic  and  traffic  with  little  delay  tolerance. 
The  RGVC  protocol,  described  in  [1],  uses  back-pressure  and 
requires  buffering  at  intermediate  nodes,  whereas  the  ERVC 
protocol,  described  in  [2],  uses  reservations  and  requires  little 
buffering  at  intermediate  nodes. 

II.  Why  the  ERVC  protocol  ? 

In  standard  reservation  schemes  (abbreviated  SRVC)  the  ca¬ 
pacity  required  by  a  session  at  an  intermediate  node  is  reserved 
starting  at  the  time  the  setup  packet  arrives  at  the  node.  This 
is  inefficient  since  the  capacity  reserved  will  actually  be  used 
at  least  one  round-trip  delay  after  the  arrival  of  the  packet  at 
the  node.  This  is  because  the  setup  packet  has  to  travel  from 
the  intermediate  node  to  the  destination,  an  acknowledgement 
has  to  be  sent  to  the  source,  and  the  first  data  packet  of  the 
session  has  to  travel  to  the  intermediate  node.  Over  long 
transmission  distances,  the  round-trip  propagation  delay  may 
be  comparable  to,  or  even  larger  than,  the  holding  time  of  a 
session.  In  particular,  if  a  typical  session  requests  capacity 
r  bits/sec,  and  transfers  a  total  of  M  bits  over  a  distance  of 
L  kilometers,  then  the  maximum  percentage  of  time  that  the 
capacity  is  efficiently  used  in  a  SRVC  protocol  is 


M 
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where  c/t?  =  5  ps/km  is  the  propagation  delay  in  the  fiber. 
Typical  values  of  these  parameters  for  the  Thunder  and  Light¬ 
ning  network  are  r  =  10  Gbit/s,  M  =  0.5  Gbit,  arid  L  —  3000 
km  (coast-to-coast  communication),  which  yields  e  =  0.625. 
In  contrast,  the  efficiency  factor  e  for  the  ERVC  protocol  can 
be  as  large  as  e  =  1,  independently  of  the  parameters  r,  L, 
and  M . 

The  “reservation  ahead11  feature  of  the  ERVC  protocol  al¬ 
lows  sessions  to  reserve  capacity  in  advance  for  use  at  a  later 
time.  Thus,  if  capacity  is  available  for  a  session  starting  at 
a  time  that  is  within  the  delay  that  the  session  can  tolerate, 
the  call  is  accepted  on  its  first  attempt.  This  feature,  there¬ 
fore,  avoids  unnecessarily  prolonged  call  setup  phases,  reduces 
a  session's  susceptibility  to  blocking,  and  leads  to  efficient  uti¬ 
lization  of  the  available  capacity. 

III.  Basic  description  of  the  protocol 

In  the  ERVC  protocol,  each  network  node  keeps  track  of  the 
utilization  profile  of  each  outgoing  link,  which  describes  the 
residual  capacity  available  on  the  link  as  a  function  of  time. 
The  utilizatio  profile  is  stored  as  a  linked-list  of  records,  and 
is  updated  efficiently.  Each  intermediate  node  reserves  the 
required  capacity  starting  at  the  time  at  which  this  capacity 
will  actually  be  used  (which  is  at  least  one  round-trip  delay 
after  the  arrival  of  the  setup  packet  at  the  node),  and  for  time 
equal  to  the  session  duration.  If  the  session  duration  is  un¬ 
known,  it  is  treated  as  infinite,  and  capacity  is  reserved  for 
that  session  for  an  unspecified  duration  (as  in  standard  reser¬ 
vation  schemes).  If  the  capacity  is  not  available  at  the  time 
requested,  the  setup  packet  may  make  a  reservation  starting  at 
the  first  time  the  capacity  becomes  available,  if  the  session  can 
tolerate  the  delay.  Since,  capacity  is  blocked  for  other  sessions 
only  for  the  duration  of  the  call  and  is  available  for  the  remain¬ 
ing  time,  this  allows  a  considerably  greater  number  of  sessions 
to  be  served.  It  also  avoids  the  wasteful  repetition  of  the  call 
setup  process,  because  it  enables  a  session  to  reserve  the  re¬ 
quired  capacity  in  its  first  attempt,  possibly  at  a  time  later 
than  the  requested  time.  If  adequate  capacity  is  available  at 
every  intermediate  node,  the  source  eventually  receives  an  ac¬ 
knowledgement  from  the  destination  and  begins  transmitting 
data.  If  the  time  at  which  adequate  bandwidth  first  becomes 
available  is  exceeds  the  delay  tolerance  of  the  session,  the  call 
is  blocked  and  is  reattempted  later,  probably  via  a  different 
path.  The  ERVC  protocol  requires  a  pre-transmission  delay 
at  least  equal  to  the  round-trip  propagation  delay  between  the 
source  and  the  destination  (as  all  reservation  protocols  do). 
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Abstract  —  We  introduce  the  spatial  coherence 
quality  of  service  requirement  for  real-time  point- 
to-multipoint  communications  in  distributed  systems. 
The  notions  of  multicast  end-to-end  delay  and  global 
jitter  are  defined  and  their  relationships  with  the  spa¬ 
tial  coherence  are  described. 

I.  Introduction 

As  there  is  an  increasing  effort  among  communications  system 
designers  to  provide  communications  applications  with  more 
and  more  elaborate  services,  coming  to  a  real-time  multicast¬ 
ing  application  in  which  a  message  sent  from  a  source  to  a  set 
of  sinks  is  required  to  meet  specified  time  and  geographical 
(spatial)  constraints,  the  underlying  communications  systems 
should  allow  spatial  coherence  quality  of  service  requirement. 
We  improve  the  steadiness  and  tightness  metrics,  defined  as 
functions  of  maximum  and  minimum  individual  point-to-point 
delays  [1],  to  provide  spatial  coherence  guarantee.  The  next 
section  introduces  the  notions  of  multicast  end-to-end  delay 
and  global  jitter  and  then  gives  their  relationships  to  the  spa¬ 
tial  coherence .  In  section  III,  three  deterministic  schedul¬ 
ing  policies  for  point-to-point  real-time  communications  are 
graded  with  respect  to  their  suitability  to  spatial  coherence. 

II.  Multicast  End-to-end  Delay,  Global  Jitter 
and  Spatial  Coherence 

Given  a  data  packet  transmitted  over  a  multipoint  connection, 
the  multicast  end-to-end  delay  is  defined  as  an  N-dimensional 
vector  d  =  (di,  c?2, ....  d; y),  where  N  is  the  number  of  elements 
in  the  recipient  set,  and  di  is  the  ith  individual  end-to-end 
delay.  The  scalar  value  of  the  multicast  end-to-end  delay  is 

derived  form  the  modulus  of  vector  cTas  d  —  (djt)2. 

It  is  a  scalar  function  of  variables  di,d2, ....  d.y.  The  infinites¬ 
imal  variation  in  the  value  of  d  is  then  derived  as  : 


In  the  above  equation,  the  term  Sdk  of  the  righthand  part 
is  the  individual  delay  jitter  for  sink  k  (jk)-  The  lefthand 
part,  Sd,  is  the  global  delay  jitter  that  takes  into  account  all 
the  individual  delay  jitters  of  the  multicast  connection.  It 
will  further  denoted  as  js .  Equation  1  is  then  rewritten  as 
js  =  12k=i  The  spatial  coherence  is  defined  as  a 

measure  of  the  skew  among  the  time  instants  at  which  a  mes¬ 
sage  transmitted  over  a  real-time  multicast  connection  is  re¬ 
ceived  at  the  different  sinks.  The  spatial  coherence  is  achieved 
when  individual  end-to-end  delays  have  an  equal  value,  in 
which  case  fracdk,d  =  1  for  all  k,  1  <  k  <  N.  Hence,  in 
order  to  guarantee  spatial  coherence,  the  ratio  fracdk ,  d  must 
be  kept  as  close  a  possible  to  unity.  In  other  words,  the  fol¬ 
lowing  double  inequality  should  hold  : 

1  -  C  <  y  <  1  +  C  (2) 

1This  work  was  done  in  the  framework  of  the  IMAG  project 
RACINES 


Where  £  is  a  positive  scalar  very  close  to  zero.  From  the  above 
definition  of  the  global  jitter,  and  imposing  a  bound  Js  on  it, 
we  derive  equation  3. 

N  N 

^(i-c)£;*<Js<^(i  +  c)£;*  (3) 

Assuming  that  £  is  close  to  zero,  the  above  equation  sim¬ 
plifies  to: 

N 

JsVN=YJh  (4) 

fc=l 

From  which  the  bounds  on  individual  delay  jitters  can  be 
solved. 

III.  Deterministic  Scheduling  Policies 

We  consider  three  deterministic  scheduling  policies  for  point- 
to-point  real-time  communications  :  1)-  the  Earliest  Due  Date 
for  Jitter  ( EDD-J )  [2],  2)-  the  Stop  &  Go  ( S  &  G)  [3]  and, 
3)-  the  Hierarchical  Round  Robbin  ($]em  HRR)  [4].  Each 
mechanism  is  graded,  in  the  range  0  to  3,  according  to  three 
criteria:  a)-  the  suitability  to  guarantee  throughput  or  bit 
rate,  b)-  the  suitability  to  guarantee  end-to-end-delay  and,  c)- 
the  suitability  to  spatial  coherence  as  a  result  of  the  previous 
two  criteria.  The  scores  are  presented  in  the  following  table. 


Throuput 

Delay 

Spatial  Coherence 

EDD-J 

1 

3 

3 

S  &  G 

3 

3 

3 

HRR 

3 

0 

1 

IV.  Conclusion 

From  three  examples  of  real-time  point-to-point  scheduling 
techniques,  we  showed  how  spatial  coherence  is  achievable 
form  the  observance  of  individual  end-to-end  delay  and  jitter 
bounds.  Thus  the  research  results  in  real-time  point-to-point 
communications  can  easily  be  extended  to  address  the  issue 
of  spatial  coherence  quality  of  service  requirement  of  real-time 
mult-casting  applications.  The  case  of  statistical  traffics  and 
statistical  multicast  real-time  requirements  can  be  dealt  with 
in  an  approach  similar  to  the  one  we  used  for  deterministic 
traffics  and  requirements. 
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Abstract  —  Peakedness  was  originally  developed  by 
teletraffic  engineers  as  a  tool  for  characterizing  call 
arrival  processes  at  a  trunk  group.  We  generalize 
the  peakedness  theory  to  include  a  class  of  stochas¬ 
tic  models  used  in  studies  of  high-speed  networks  and 
apply  it  to  the  approximate  analysis  of  a  statistical 
multiplexer. 


I.  Introduction 

In  networks  based  on  the  Asynchronous  Transfer  Mode 
(ATM),  information  is  transmitted  asynchronously  over  high¬ 
speed  links  in  the  form  of  53-byte  units  called  cells.  Accurate 
traffic  characterization  is  a  crucial  step  in  performing  network 
resource  allocation  and  dimensioning. 


II.  Generalized  Arrival  Process 

Define  a  rate  process  {Rttt  >  0}  to  be  a  strictly  stationary 
random  process  with  finite,  nontrivial  first  two  moment  mea¬ 
sures.  The  process  { Rt,t  >  0}  is  to  be  understood  in  the 
generalized  function  sense  with  the  interpretation  that  Rtdt 
represents  the  amount  of  work  arriving  in  the  infinitesimal  in¬ 
terval  [t,t-\-dt).  The  generalized  arrival  process  is  then  defined 
by 


where  Nt  represents  the  amount  of  work  arriving  in  the  inter¬ 
val  (0,  i]. 

The  standard  arrival  process  defined  as  a  stationary  point 
process  is  a  special  case  with 


oo 

Rt  =  ^2biS{t-Ti),  (2) 

{=1 

where  b{  is  the  number  of  arrivals  at  the  zth  arrival  epoch,  Ti, 
and  6(t)  is  the  Dirac  delta  function.  Another  special  case  is 
the  discrete-level  fluid  process  with 


Rt 


f>  «ct  ( 

i— 1  ' 


t-Tj  \ 
Ti+ 1  -  Ti )  ' 


(3) 


where  fi  is  the  fluid  flow  rate,  Ti  is  the  epoch  of  the  zth  tran¬ 
sition  and  rect(i)  =  u(t)  —  u{t  —  1),  where  tz(i)  is  the  unit  step 
function. 


III.  Generalized  Peakedness 

We  introduce  a  concept  of  peakedness  for  a  general  arrival 
process  as  defined  by  (1).  The  arrival  process  is  offered  to  an 
infinite  server  system  which  is  represented  by  an  i.i.d.  process, 
{Dt,t  >  0},  with  marginal  cdf  F.  Define 


/ 


t  —  ti}-  Ru  du, 


(4) 


1The  first  author  has  been  supported  by  an  NSERC  Postgradu¬ 
ate  Scholarship. 


with  the  following  interpretation:  In  the  interval  [u,u  +  du), 
Rudu  units  of  work  are  offered  to  a  new  server,  introduced  at 
time  u ,  which  removes  this  work  from  the  system  after  a  du¬ 
ration  Du.  Then  St  represents  the  amount  of  work  present  in 
the  system  at  time  t.  The  peakedness  functional  with  respect 
to  the  service  time  cdf  F  is  defined  by 


z[F]  =  lim 

t— MX) 


Var  [St] 
E[St]  ’ 


(5) 


For  the  case  of  an  orderly  point  process,  the  definitions  (4) 
and  (5)  reduce  to  the  standard  concept  of  peakedness. 

The  following  result  of  Eckberg  [1]  extends  to  our  general¬ 
ized  notion  of  peakedness: 

/oo 

[fc(a:)  —  AS(a:)]JF*(a:)da;.  (6) 

-OO 

Here,  F*  is  the  autocorrelation  function  of  Fc  =  1  —  F, 
^  =  /.-*»*  is  the  mean  service  time,  A  =  E[Rt]  is 
the  mean  arrival  rate,  and  &(r)  =  Cov(i?t_j.x,  Rt)  is  the  co- 
variance  function  of  the  rate  process. 


IV.  Application 

The  generalized  peakedness  can  be  obtained  in  closed  form 
via  (6)  for  a  large  class  of  stochastic  traffic  models,  including 
the  popular  Markov  modulated  fluid  models.  In  particular, 
the  peakedness  function  of  a  Markov  on-off  fluid  with  peak 
rate  r,  mean  on  time  and  mean  off  time  a-1  with  respect 
to  constant  service  time  distribution  is  given  by 

+  P  +  Ml  -  (7) 

Peakedness  can  also  be  estimated  empirically  through  mea¬ 
surements  of  an  actual  traffic  stream  and  then  used  to  con¬ 
struct  a  stochastic  traffic  model. 

Lee  and  Mark  [2]  propose  a  method  for  approximating  a 
general  arrival  process  with  a  more  computationally  tractable 
superposition  of  two  types  of  on-off  Markov  fluid  sources  by 
matching  central  moments  of  the  rate  process  Rt  and  an  in¬ 
dex  of  dispersion  measure.  Since  the  peakedness  function  con¬ 
tains  strictly  more  information  about  the  arrival  process  than 
the  index  of  dispersion,  a  more  accurate  traffic  characteriza¬ 
tion  can  be  achieved  by  using  the  peakedness  function  (7)  to 
perform  the  match.  We  demonstrate  the  effectiveness  of  our 
approach  with  an  application  to  the  analysis  of  a  statistical 
multiplexer. 


References 

[1]  A.  E.  Eckberg,  “Generalized  Peakedness  of  Teletraffic  Pro¬ 
cesses,”  Proc.  10-th  International  Teletraffic  Congress,  Mon¬ 
treal,  Canada,  1983. 

[2]  H.  W.  Lee  and  J.  W.  Mark,  “ATM  Network  Traffic  Character¬ 
ization  Using  Two  Types  of  On-Off  Sources,”  INFOCOM  ' 93 , 
pp.  152-159,  1993. 


42 


Fault  Detection  in  Communication  Protocols  using  Signatures 

G.  Noubir,  K.  Vijayananda,  P.  Raja 
Swiss  Federal  Institute  of  Technology,  Lausanne, 

Computer  Engineering  Department,  EPFL-DI-LIT, 
noubir,  vijay,  raja@di.epfl.ch 


Abstract  —  Run-time  fault  detection  in  communica¬ 
tion  protocols  is  essential  to  detect  faults  that  cannot 
be  detected  during  the  testing  phase.  In  this  paper, 
we  use  a  polynomial-based  signature  function  to  de¬ 
tect  run-time  faults  in  communication  protocols. 

I.  Introduction 

Signature  Analysis  [2]  and  FSM  methods  [4,  l]  are  two  pop¬ 
ular  methods  that  are  used  to  verify  the  control  flow  of  pro¬ 
grams.  Run-time  fault  detection  in  communication  protocols 
is  essential  to  detect  faults  that  arise  due  to  coding  defects, 
memory  problems  and  external  disturbances.  In  this  paper, 
we  summarize  the  results  presented  in  [3].  We  propose  a  new 
signature  function  which  is  based  on  polynomials,  to  detect 
run-time  faults  in  communication  protocols.  Every  state  has 
a  signature  which  represents  the  signature  of  all  paths  leading 
to  that  state  and  this  is  stored  in  a  static  signature  table.  The 
run-time  path  is  transformed  into  a  number  (signature)  using 
the  signature  function  and  compared  with  the  static  signature 
table  for  its  correctness.  While  the  FSM  table  has  at  least  two 
dimensions,  the  static  signature  table  has  only  one  dimension. 

II.  Signature  Generation 

Let  A  =  (Q,  E,  6)be  a  FSM  with  a  state  5o  such  that  it  has 
a  predefined  signature  equal  to  zero  and  all  the  other  states 
are  reachable  from  So.  The  signature  function  is  a  polyno¬ 
mial  with  the  values  of  states  and  events  as  coefficients  and 
maps  every  path  beginning  at  state  So  into  a  value  from  an 
algebraic  field  F.  For  any  two  paths  C\  and  C2,  the  signa¬ 
ture  function  must  satisfy:  3p  <  1;  Prob[Signature(Ci )  = 
Signature(C2  )|Ci  ^  C2]  <  p,  where  p  is  defined  as  the  alias¬ 
ing  probability  of  the  signature  function.  We  use  three  kinds 
of  signature  depending  on  the  availability  of  the  state  and 
event  information.  They  are  full-path,  event,  and  state 
signatures.  The  polynomials  associated  with  these  signatures 
are  given  below.  The  signature  is  computed  by  evaluating  the 
polynomial  at  a  given  point  xo- 

Full-path:  Pc(x)  =  YY=o  +  etx2(n“,)_1)  +  sn 

State:  Pc(x)  =  Y^i=o  sixU~l  + 

Event:  Pc(x)  =  etZn-,_1 

where:  s; :  state  value,  a:  event  value,  n:  length  of  the 
state  path  and  x:  a  number  from  a  given  Galois  field  F. 
The  following  theorem  gives  an  upper  bound  on  the  aliasing 
probability  of  the  signature  function. 

Theorem  1  Prob[Signature(C\)=Signature(C2)  \  C\  ^  C2] 
1 

=  W\ 

Corollary  1  The  probability  that  an  illegal  path  is  undetected 
is  equal  to  jTy 

°This  work  was  partially  supported  by  the  Swiss  PTT  project 
F&  E  N°309. 


Fig.  1:  Event- State  assignment  example. 


When  many  paths  lead  to  the  same  state,  they  are  called  par¬ 
allel  paths .  Parallel  paths  should  result  in  the  same  signature. 
This  will  reduce  the  complexity  of  signature  verification.  This 
constraint  is  used  in  generating  the  system  of  linear  equations 
which  can  be  used  to  assign  values  to  the  states  and  events 
(state-event  assignment  problem)  [3]. 


III.  Example 

We  explain  the  fault  detection  technique  using  the  FSM  shown 
in  Fig.  1.  Solving  the  state-event  assignment  problem  for  x  = 
2,  we  have  Si  =  1,  S2  =  2,  53  =  3,  S*  =  4,  a  =  1,  b  =  2,  c  = 
—21,  and  d  =  —7.  The  initial  state  (Si)  has  a  signature  value 
equal  to  0.  The  signature  is  computed  for  x  —  2.  Consider 
the  path  SibS^cSzaSs.  The  step-wise  computation  of  full- 
path  signature  is  shown  in  Tab.  1. 


Path 

Computation 

Signature 

Sib 

(0*2  +  1)  *2  +  2 

4 

Si  bS±c 

((4*2+4)  *2 +  (-21)) 

3 

Si  bSicS2a 

((3  *2  +  2)  *2  +  1) 

17 

Tab.  1:  Full-path  signature 


IV.  Conclusion 

We  have  presented  a  signature-based  method  for  detecting 
run-time  faults  in  communication  protocols.  This  technique 
has  been  applied  to  detect  faults  in  protocols  like  ABP  and 
TP4  [3]. 
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Abstract  —  Message  arrivals  encountered  in  digital 
transmission  over  most  real  communication  channels 
are  not  independent  but  appear  in  clusters.  We  pro¬ 
pose  a  model  of  such  a  bursty  K— ary  source  using  a 
Markov  chain  with  two  states.  It  is  shown  that  the 
protocol  information  of  this  sporadic  source  can  be 
drastically  reduced  on  the  one  hand  by  not  encoding 
intermessage  information  (e.g.,  the  starting  point  of 
a  packet)  and  on  the  other  hand  by  buffering  and  re¬ 
ordering  messages.  Trade-offs  between  reduced  pro¬ 
tocol  information  and  message  delays  are  also  consid¬ 
ered. 

Summary 

Messages  such  as  commands,  inquiries,  file  transmissions,  and 
the  like,  traveling  through  a  network,  are  extremely  bursty.  A 
model  of  a  bursty  AT— ary  source  using  a  Markov  chain  with 
two  states  “quite”  (or  “idle”)  and  “busy”  (sometimes  also 
called  “active”)  is  proposed  as  a  sufficiently  realistic  model 
for  many  such  sources.  In  the  “quiet”  state,  the  source  trans¬ 
mits  no  (message)  information,  while  in  the  “active”  state, 
the  source  acts  as  a  ( K  —  1)— ary  discrete  memoryless  source 
(DMS).  The  transition  probabilities  between  states  describe 
the  sporadic  nature  of  the  source.  Let  p  and  q  denote  the 
probability  of  changing  from  the  quiet  to  the  busy  state  and 
from  the  busy  to  the  quiet  state,  respectively.  With  this,  the 
information  rate  in  the  steady-state,  defined  as  the  entropy 
per  source  letter  [bits/time  unit],  [/,  can  be  calculated  to  be 

,oe(K  ~ 11  *»>*,>(» 

' - v - '  v - V - ' 

message  information  protocol  information 

where 

h(p)  =  -plogp-  (i  -p)  iog(i  -  p ) 

is  the  binary  entropy  function. 

Since  each  message  symbol  contains  log  (AT  —  1)  bits  of  in¬ 
formation  and  since  the  source  is  producing  message  symbols 
during  a  fraction  p/{p  +  q)  of  time,  the  first  term  on  the  right 
side  of  (1)  may  be  interpreted  as  the  entropy  of  the  messages. 
Similarly,  the  second  term  may  be  viewed  as  the  entropy  in 
the  message  length  and  the  third  term  as  the  entropy  in  the 
length  of  the  quiet  periods.  The  information  in  the  source 
output  consists  of  two  parts:  a  message  part  and  a  protocol 
part.  Although  such  a  separation  seems  reasonable  intuitively, 
it  is  by  no  means  entirely  apparent  that  message  information 
and  protocol  information  can  be  separated  completely  from 
one  another  and  considered  independently.  We  show  that  a 
significant  fraction  of  the  channel  capacity  must  be  used  for 
protocol  information  when  either  the  expected  message  length 
is  short  (q  0),  or  the  quiet  sequences  are  much  longer  than 
the  message  sequences  ( q/p  1),  or  the  signalling  alphabet 
is  small. 


Whereas  message  information  must  generally  be  encoded 
losslessly,  it  is  usually  not  necessary  to  encode  all  protocol  in¬ 
formation.  For  instance  in  a  packet-switching  network,  mes¬ 
sages  are  generally  delayed  by  varying  amounts  in  passing 
through  the  network  in  different  ways  and  their  arrival  or¬ 
der  may  be  changed.  One  can  save  protocol  information  by 
not  resolving  intermessage  time  delays.  If  we  are  not  inter¬ 
ested  in  “full-reconstructability”  of  the  entire  source  output 
including  the  messages  in  their  original  order  and/or  the  exact 
length  of  quiet  periods,  then  we  can  use  less  than  an  average 
of  h(q)  =  H(L)/E(L)  bits  per  message  symbol  to  indicate 
the  length  L  of  the  messages  and/or  less  than  q/p  •  h(p)  bits 
per  message  symbol  to  indicate  the  lengths  of  the  quiet  peri¬ 
ods.  It  is  precisely  the  possibility  of  reordering  and  buffering 
the  messages  that  permits  a  decrease  in  the  amount  of  pro¬ 
tocol  information  to  be  transmitted.  We  present  both  coding 
strategies  that  maintain  messages  in  their  original  order  going 
through  the  network  and  coding  strategies  that  ignore  mes¬ 
sage  order.  Unfortunately,  the  reduction  in  protocol  informa¬ 
tion  by  the  latter  strategies  is  gained  mostly  at  the  cost  of  an 
enlarged  message  delay.  One  of  the  most  important  perfor¬ 
mance  measures,  however,  is  the  average  (or  the  maximum) 
delay  required  to  deliver  a  message  from  the  origin  to  the 
destination.  We  analyze  the  trade-off  between  the  maximum 
tolerable  delay  and  the  amount  of  protocol  information  that 
must  be  sent.  It  is  shown  that  the  minimal  necessary  protocol 
information  required  to  encode  the  message  length  decreases 
exponentially  fast  with  increasing  delay.  Examples  are  given 
to  illustrate  and  to  compare  the  various  strategies.  Finally, 
certain  generalizations  of  the  concept  of  sporadic  sources  are 
devised  for  some  related  applications. 

Acknowledgements 

The  author  is  very  grateful  to  J.L.  Massey  for  his  help  and 
many  fruitful  discussions. 

References 

[1]  Gallager,  R.G.,  Information  Theory  and  Reliable  Communica¬ 
tion,  John  Wiley  &;  Sons,  Inc.,  1968 

[2]  Bertsekas,  D.  and  Gallager,  R.G.,  Data  Networks,  Prentice- 
Hall,  Inc.,  1992 


44 


An  Analysis  Approach  for  Cell  Loss  Rate  of  Shared  Buffer  ATM  Switching 


Zhao  Yu-biao,  Yu  Jian-ping,  and  Liu  Zeng-ji 
National  Key  Lab.  of  ISN,  Xidian  University,  Xi’an  710071,  P.R.China 


Abstract  —  A  novel  approach  is  presented  for  an¬ 
alyzing  cell  loss  rate  of  shared  buffer  ATM  switching.  It 
provides  a  new  means  to  solve  problems  in  more  complex 
queueing  system.  It  is  an  accurate  alogrithm  instead  of 
conventional  methods  by  employing  a  one-step  transition 
matrix. 
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SUMMARY 

ATM  is  a  promising  tansport  and  switch  technique  for 
a  future  B-ISDN.  One  of  major  areas  under  study  of  ATM 
switching  system  is  switch  architectures.  Among  various 
kinds  of  ATM  architectures,  shared  buffer  ATM  switching 
is  the  best  choice  in  terms  of  cell  loss  rate,  throughput  and 
swithching  delay [1]. 

The  relation  between  cell  loss  rate  and  shared  buffer 
size  is  analyzed  in  some  literatures.  Those  results  are  not 
accurate  because  the  number  of  total  cells  that  arrive  at 
each  time  slot  destined  for  the  individual  output  ports  are 
not  independent.  Since  the  total  number  of  cells  arriving 
at  each  time  slot  is  no  larger  than  switch  input  ports,  those 
cells  do  not  switch  for  the  other  output  ports,  if  some  cells 
destine  for  some  certain  output  ports.  The  negative  cor¬ 
relation  causes  the  sum  of  the  queues  for  the  output  ports 
to  be  stochastically  smaller  than  what  this  sum  would  be 
were  the  queues  to  be  independent.  Based  on  this  opinion, 
an  accurate  approach  is  developed  for  analysis  of  cell  loss 
rate  in  shard  buffer  ATM  switch.  Outline  of  this  analytic 
method  is  addressed  as  follows. 

The  shared  buffer  switch  has  N  input  ports  and  N  out¬ 
put  ports.  At  each  time  slot,  cells  arrive  at  each  input  link 
according  to  a  Bernoulli  process  with  probability  p  <  1. 
Each  cell  is  uniformly  to  be  destined  for  any  of  the  N  out¬ 
put  ports.  And,  at  each  input  ports,  successive  cells  that 
do  arrive  are  independently  destined  for  their  respective 
output  ports. 

Let  a{  represent  the  probability  of  i  arriving  cells  to 
the  switching  at  each  time  slot.  Based  on  the  assumption 
above,  a;  is  a  binomail  -distributed.  That  is 


a* 


=  Cjy  p*(  1  -  P)N-{ 


It  is  assumed  that  the  state  of  Markov  chain  is  respre- 
sented  by  the  number  of  cells  in  the  switching.  Then  the 
probabilty  transition  matrix  of  arriving  cells  regardless  of 
leaving  cells  is 


At  each  time  slot,  the  number  of  cells  which  are  trans¬ 
mitted  to  output  links  is  that  of  output  ports  which  have 
queueing  cells.  Let  bni  be  the  probability  for  i  output 
ports  which  have  queueing  cells  when  the  total  number  of 
cells  is  n  in  the  switching.  Therefore,  the  following  equa¬ 
tion  can  be  derived  by  using  Markov  chain.  That  is 


bni  — 


3  =  0  _______ 


(*'-!)! 


N\ 

Nn(N  -*)! 


The  leaving  cells  probability  transitoin  matrix  regard¬ 
less  of  arriving  cells  is 


Pb  = 
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From  the  analyzing  above,  we  can  derive  the  realis¬ 
tic  probability  transtion  matrix  for  the  switching  system. 
That  is 

P  =  Pb  Pa 

In  order  to  solve  the  steady  probability  from  this  ma¬ 
trix,  let  the  biggest  number  of  state  be  large  enough  such 
that  the  difference,  due  to  the  finite  state  instead  of  the 
infinite  state,  is  negligibly  small,  then  the  steady  proba¬ 
bility  distribution  can  be  easily  obtained  by  formal  ways. 
That  is  the  accurate  relation  between  cell  loss  rate  and 
buffer  size  in  the  switching. 
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Abstract  —  Automatic  Repeat_Request(ARQ)  has  been  widely  used  for 
its  high  reliability  and  convenience  in  implementation.  But  its  low  performance 
is  shown  when  channel  is  in  noisy  state.  This  paper  presents  an  adaptive  error 
control  scheme  with  combination  of  FEC  and  ARQ.  An  encoder  and  a  decoder 
for  large  constraint  length  convolution  codes  are  constructed  by  TMS320E25 
microprocessors  to  implement  error  control.  Based  on  channel  condition,  the 
system  can  modify  diffuse  convolution  codes  constraint  length  automatically. 
Such  an  adaptive  error  control  system  combined  with  an  ARQ  system  based  on 
HDLC  protocol  is  efficient  to  transmit  data  in  high  speed  under  bad  radio 
channels. 

I  .FEC/ARQ  Adaptive  Error  Control  Mode 

The  adaptive  error  control  system  block  diagram  is  shown  in  Fig.  1,  where  the 
CCU(Communication  Control  Unit)  implements  HDLC  protocol  and  system 
control. 


Assuming  the  employed  codes  are  denoted  as  Cj  ,  C2 ,  C^,  where  C-2  > 

C3  ,  ,  C6  are  self-orthogonal  diffuse  convolution  codes  (2,l,4X+2),  X  is 

scaled  as  32,  64,  128,  256,  512  and  Cj  can  be  expressed  as  CRC  implying 
that  the  transmitting  data  is  encoded  only  by  cyclic  redundant  check  bits  and 
ARQ  protocol  is  employed.  The  C2,  C3 ,  •••  ,  Cg  can  correct  2X  bits  burst 

errors  and  2  bits  random  errors.  The  rate  is  0.5  only,  and  the  decoding  operation 
brings  about  delay.  With  the  growth  of  the  diffuse  length,  the  decoding  delay  and 
the  protection  bits  will  rise,  and  the  throughput  will  decrease.  Thus,  the  focus 
problem  is  to  determine  dynamically  which  of  the  available  codes  to  achieve  the 
highest  throughput  for  each  channel  status. 

The  system  uses  two  frame  structures:  one  is  named  as  special  frame,  Fs,  which 
begins  with  a  special  frame  flag  OFFH,  followed  by  diffuse  length  index;  the  other 
is  named  as  data  frame,  Fd,  which  begins  with  a  data  frame  flag  00H,  and 
followed  by  encoded  data 

In  the  scheme,  the  channel  condition  is  indicated  by  the  probability  of  error  frame. 
The  procedure  of  the  scheme  can  be  described  briefly  as  follow:  suppose  now  that 
the  is  used  and  the  data  packet(raw  data)  length  is  L  bits,  the  transmitter 

counts  the  successful  transmission  frames  per  M  ffamesfincluding  the 
retransmission  frames,  but  excluding  Fs  frames),  let  the  result  is  denoted  as  V,  if  V 
<  N(N  is  threshold),  then  the  channel  is  in  worse  condition,  and  the  transmitter 
attempt  to  adopt  C^+i  and  transmits  a  Fs  frame  to  the  receiver,  then  makes 
statistics  of  the  successful  transmission  from  the  beginning;  if  N  <  V  <  M,  is 
suitable  for  the  channel  condition;  if  V=M,  the  transmitter  checks  whether  or  not 
the  3  *  M  transmission  is  successful  without  retransmission  continuously,  if  not, 
Ck  can  be  employed  without  changing,  else  the  transmitter  will  attempt  to  adopt 
and  transmits  a  Fs  frame  to  the  receiver.  In  this  way,  the  code  can  be 
selected  dynamically  to  achieve  the  highest  throughput. 

n .  The  FEC  Board  and  Operation  Principle 

The  encoding  and  decoding  is  accomplished  by  a  single  board(FEC  board)  which 
contains  two  TMS320E25  and  peripheral  interface  unit.  Because  the  algorithm  is 
executed  by  software,  so  the  circuit  is  simple,  the  board  size  is  small  and  it  is 
convenient  to  change  code  from  one  to  another.  The  4K*1  6  EPROM  on  chip  is 
sufficient  to  contain  five  subroutines  corresponding  to  the  available  codes 


C2,  — ,  Cg.  When  CCU  interrupts  FEC  board  and  then  sends  a  code  index  to 

FEC  board,  the  TMS320E25  executes  the  corresponding  subroutine.  The  ARQ 
protocol  is  accomplished  by  CCU,  simultaneously,  the  CCU  makes  statistics  of 
successful  frame  and  takes  a  selection  of  codes,  and  then  conveys  the  index 
number  of  the  selected  code  to  FEC  board. 

The  convolution  code  synchronization  is  achieved  by  use  of  frame  synchronization. 
The  Barker(ll)  is  used  as  synchronization  code,  and  five  Barker(ll)  construct  a 
synchronization  code  group  to  ensure  at  least  one  of  the  five  codes  not  disturbed. 
The  TMS320E25  on  the  FEC  board  makes  correlation  calculation  to  decide 
whether  the  receiving  frame  is  in  synchronization  or  not.  Followed  the 
synchronization  code  group,  a  NOT  Barker(ll)  is  arranged  to  indicate  the  end  of 
synchronization  head,  the  continued  is  encoded  data.  At  the  last  part  in 
transmitting  frame,  several  protection  bits  is  added  to  ensure  the  data  remained  in 
buffer(corresponding  to  shift-registers  in  hardware  design)  to  be  decoded 
completely . 

HI .  Discussion  and  Testing  Result 

The  system  performance  depends  on  the  parameters  L,  M,  N.  With  the  growth 
of  L,  the  probability  of  frame  failing  transmission  will  rise  on  fading  channel 
condition.  The  larger  M  is  ,  the  slower  the  system  sensitivity  to  channel  condition 
is.  The  larger  N(N  <  M)  is,  the  more  frequent  code  adjustment  is.  By  practices,  we 
have  obtained  some  valuable  data  about  the  optimal  parameters  over  mobile 
channel. 

For  testify  the  whole  efficiency  of  this  system,  we  make  some  practices  on  the 
following  condition:  Rc(channel  data  speed )=32Kb/s,  L=1024,  M=5,  N=3.  Let 
burst  error  probability  be  denoted  as  Pr  Pr=  t  /2T,  where  r  :  burst  error  lasting 
time,  T  :  burst  error  appearing  period.  Let  Pg  express  probability  of  random  error 
and  P=Pf+  Pg  express  probability  of  burst  and  random  error  combination.  We 

transmitted  a  file  sized  1920K  bits  in  several  simulative  channels  and  obtained 
some  practice  data  listing  in  Table  1,  Table  2,  and  Table  3.  In  the  following 
tables,  T  j  expresses  the  consumed  time  in  ARQ  mode  without  error-correcting;  T2 

expresses  the  consumed  time  in  the  mode  described  in  this  paper.  From  the  result, 
it  can  be  seen  that  the  system  performance  is  equivalent  to  ARQ  system  on 
unmixed  burst  error  channel,  while  on  other  feature  channels,  the  system  is  much 
superior  to  ARQ  system  . 


Table  3  T=1S 


Pr 

Tj(s) 

T2(s) 

1  x  10'3 

1  X  1 0'4 

96 

99 

1  X  10-3 

X 

0 

438 

154 

2.5  x  10"3 

CO 

0 

X 

557 

172 

5  x io~3 

1  X  1  o-2 

co 

188 
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Tabel  1  T=1S 


T 

Pr 

Tj(s) 

T2(s) 

2 

1  x  10"3 

80 

80 

5 

2.5  x  to-3 

82 

83 

10 

5  x  io“3 

83 

83 

30 

1.5  x  10-2 

85 

87 

Table  2 


Pg 

T^s) 

T2(s) 

1  x  1  O'4 

90 

91 

1  x 10"3 

375 

141 

5  x 10“3 

827 

166 

1  x  10-2 

00 

170 
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I.  Introduction 

Some  error  probability  estimation  methods  of  a  trellis-coded 
modulation  (TCM)  scheme  using  importance  sampling  have 
been  proposed  [1].  However,  these  methods  are  not  suitable 
for  an  additive  non-Gaussian  noise  channel  case.  The  main 
problem  is  how  to  design  the  probability  density  function  in 
importance  sampling.  We  propose  a  new  design  method  of 
the  probability  density  function  related  to  the  Bhattacharyya 
bound. 

II.  Proposed  Method 

Let  Si  and  S2  be  transmitted  signals,  and  r  be  the  received 
signal.  Now,  we  consider  a  decision  system  which  decides  that 
the  transmitted  signal  is  whether  Si  or  s2  from  the  received 
signal  r.  When  the  transmitted  signal  is  Si,  the  indicator 
function  of  the  error  region  $(•)  is  expressed  as 


$(r)  = 


1,  /(r|si)  < /(r|a2) 

0,  f(r\si)  >  f(r\s2) 


(1) 


where  /(  |  )  is  the  conditional  probability  density  function. 
The  ideal  probability  density  function  for  importance  sam¬ 
pling  is  propotional  to  $(r)f(r\sx ).  The  bound  of  the  func¬ 
tion  $(•)  is  very  complex  for  most  conditional  probability 
density  function  cases.  In  Bhattacharyya  bound,  we  evalu¬ 
ate  the  error  probability  from  the  upper  bound  of  4>(*),  that 
is,  y/fl  r\s2)/ f(v\s\).  The  proposed  probability  density  func¬ 
tion  f* (7* |^i)  in  importance  sampling  is  designed  almost  the 
same  idea  with  the  Bhattacharyya  bound  and  given  by 


/*(r|a,)a>//(r|31)/(r|«J)-  (2) 


When  the  noise  is  an  AWGN,  the  probability  density  function 
of  the  proposed  method  is  reduced  to  that  of  mean  translation 
method  in  [3].  The  detail  of  the  proposed  method  is  in  Ref.  [5]. 

III.  Numerical  Example 
A.  Noise  Model 

As  an  additive  non-Gaussian  noise  model  in  the  examle,  an 
additive  combination  of  an  AWGN  of  variance  a2  and  an  im¬ 
pulsive  noise  of  Gaussian  distribution  of  variance  a2  which 
is  observed  with  the  probablity  ?(<  1)  per  symbol  interval 
is  used  [4].  By  taking  the  convolution  of  the  two  probabil¬ 
ity  density  functions,  the  probability  density  function  of  the 
additive  non-Gaussian  noise  is  rewritten  as 


f(x,y) 


1-7 

2  wag 


exp 


f  *2  +  vl 

\  2  <r| 


} 


2tt  (a%  +  cr2) 


exp 


2  (*2 +  *?)/* 


B.  Simulation  Results 

The  encoder  used  in  the  example  is  (9,  2,  4)  Ungerboeck  code 
in  [2].  As  noise  parameters,  7  =  0.01  and  cr,  =  10ag  were  used. 
We  selected  50  error  events  for  the  simulation  based  on  the 
measure  of  the  smaller  Bhattacharyya  distance.  The  number 
of  simulation  runs  per  error  event  were  1000.  To  compare  with 
the  proposed  method,  the  ordinary  Monte-Carlo  simulation 
was  tried.  It  was  continued  till  200  error  bits  were  observed. 

Figure  1  shows  BER  and  variance  performance.  When 
BER  <  10~4,  the  proposed  method  approximates  more  than 
95%  of  bit  error  rate  of  the  ordinary  Monte-Carlo  method. 
The  necessary  CPU  time  of  the  proposed  method  is  about 
1/85  at  BER  of  10~6.  The  variance  of  the  simulation  result 
of  the  proposed  method  is  almost  half  of  that  of  the  ordinary 
Monte-Carlo  method  for  all  Eb/No .  Under  the  condition  of 
same  variance,  the  reduction  of  simulation  time  of  the  pro¬ 
posed  method  is  estimated  about  1/170  at  BER  of  10~6. 
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Since  it  is  difficult  to  make  random  numbers  following  the  Fig.  1:  BER  and  variance  performance, 

probability  density  function  designed  by  the  proposed  method, 
we  approximate  the  probability  density  function  /*(-|*)  de¬ 
signed  by  the  proposed  method. 
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Abstract  —  This  paper  describes  the  extended 
LFSR(ELFSR)  and  the  extended  BRM(EBRM) 
based  on  the  field  GF( 2n).  We  claim  that  those  pre¬ 
sented  generators  are  efficient  and  suitable  for  S/W 
implementation.  We  also  claim  that  the  EBRM  can 
be  used  as  a  good  non-linear  logic  for  stream  cipher 
systems. 

I.  Introduction 

A  binary  rate  multiplier  (BRM)  secpience  generator,  consisting 
of  two  linear  feedback  shift  registers(LFSRs)  of  length  in  and 
n  respectively,  has  cryptographically  good  properties[l].  Un¬ 
der  some  constraints,  it  produces  binary  sequences  of  period 
(2m  — 1)(2"  —  1)  and  linear  complexity  m(2n  —  1).  The  LFSR 
is  well  known  to  have  good  properties[2],  however,  it  is  not 
suitable  for  DSP  implementation. 

In  this  paper,  we  propose  the  extended  LFSR(ELFSR) 
based  on  the  field  GF(28),  which  can  be  efficiently  and  eas¬ 
ily  implemented  by  the  general  purposed  DSPs.  And  then,  we 
present  the  extended  BRM(EBRM)  sequence  generator,  which 
consists  of  two  ELFSRs  of  length  in  and  n  respectively  and 
based  on  the  G F(T ).  It  produces  byte  sequences  of  period 
(28m  —  1  )(28"  —  1)  and  linear  complexity  m(2Sn  —  1). 

II.  The  Extended  LFSRs 

An  ELFSR  consists  of  m  memory  cells,  which  together  form 
the  state  (sq ,  si,  ■  •  ■ ,  sm_i)  of  the  registers.  The  function  f(x) 
is  mapping  of  {GF(2n)}m  to  GF(2n). 

f(x)  =  Co  0  (ci0x)  e  (c20:r2)  0 - 0(cm_i  )  0  xm 


Fig  1.  An  ELFSR:  ©  and  ®  denote  the  operations  of  addition  and 
multiplication,  respectively,  in  the  ground  field  GF( 2n). 

Property.  The  period  of  an  ELFSR  over  GF( 2n)  with  a  prim¬ 
itive  polynomial  f(x)  of  degree  m  is  2mn  — 1. 

If  we  denote  a  and  0  in  GF(2n)  by  a  =  (2/1, 2/2,  ■  •  • ,  yn)  and 
0  =  (z\ ,  zo,  •  •  • ,  £n),  then  the  addition  of  two  elements  is  de¬ 
fined  by  tv  0  ft  =  (?/!  \/z\ ,  2/2  Vz2,  ■  •  • ,  yn  V’„),  where  V  means 
the  XOR  of  two  binary  integers.  Hence  the  operation  0  can 
be  simply  computed  by  the  bitwise  XOR  ol  two  binary  blocks. 
However,  in  general,  it  is  not  easy  to  compute  the  multipli¬ 
cation  of  two  elements  in  GF(2n).  We  adopt  the  method  of 
multiplication  introduced  in  [3]. 

Definition.  A  polynomial  over  GF(2n)  is  simple  provided  that 
all  of  its  coefficients  but  the  constant  term  are  either  0  or  1. 


Algorithm  1:  The  ELFSR 
Input.  A  simple  primitive  polynomial  f(x)  of  degree  m 
and  two  tables  defined  by  the  preprocessing.  Let 
c[k]  be  the  coefficients  of  f(x)  for  all  0  <  k  <  m  —  1 

Step  1.  For  k  =  0,  — 1,  initialize  $[k]  by  a  random 

byte. 

Step  2.  Compute  7  =  c[0]®s[0]. 

Step  3.  For  k  —  1,  •  •  ■ ,  m— 1,  if  c[k]  is  1  then  t  =  t0s[F]. 

Step  4.  For  k  =  1,  •  •  • ,  m  — 1,  set  s[k]  =  s[k— 1].  And  then, 

set  s[0]  =  t. 

Step  5.  Repeat  Step  2  -  Step  4  to  produce  sufficiently 
many  random  bytes. 

III.  The  extended  BRM 

Now  we  present  an  extended  BRM  sequence  generator,  which 
consists  of  two  extended  LFSRs  of  length  m  and  n  respec¬ 
tively  and  based  on  the  GF(2S).  It  produces  byte  sequences 
of  period  (28m  —  l)(2Sn  —  l)  and  linear  complexity  ?n(28n  — 1). 

Algorithm  2:  The  extended  BRM 

Input.  Two  extended  LFSRs  SRl  and  SR2  of  length  m, 
n  respectively. 

Step  1.  Initialize  all  arrays  of  two  ELFSRs  by  random 
bytes. 

Step  2.  At  time  =  t,  the  two  LFSRs  are  both  clocked 
Step  3.  If  the  output  of  SRl  is  odd,  SR2  is  then  clocked 
one  more  time. 

Step  4.  Repeat  Step  2  -  Step  3  to  produce  sufficiently 
large  number  of  random  bytes. 

IV.  Concluding  Remarks 
In  this  paper,  we  proposed  the  ELFSR  based  on  the  field 
GF(2Tl),  which  can  be  efficiently  and  easily  implemented  by 
general  purposed  DSPs.  And  then,  we  presented  the  EBRM 
sequence  generator,  which  consists  of  two  ELFSRs  of  length 
771  and  n  respectively  and  based  on  the  GF( 2s)  so  efficiently 
implemented  by  DSPs.  We  are  now  examining  the  security 
and  efficiency  of  the  proposed  generators. 
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I.  Summary 

Galois  or  finite  fields  have  applications  in  cryptography  and 
coding  theory.  For  example,  both  encoding  and  decoding  of 
Reed-Solomon  codes  require  computations  in  the  field  over 
which  the  code  is  defined.  Among  the  different  arithmetic 
operations  in  finite  fields,  multiplicative  inversion  (hereafter 
called  simply  inversion)  has  been  identified  as  the  most  compli¬ 
cated  operation.  Recently,  several  approaches  have  been  made 
to  compute  the  inverse  efficiently.  The  approaches  which  have 
been  given  considerable  attention  in  the  literature  are  based 
on  either  Euclid’s  algorithm  [1],  or  Fermat’s  theorem  [2]  or 
solution  of  a  set  of  linear  equations  [3].  The  latter  approach 
is  used  in  our  present  work  to  compute  inverses. 

m 

Let  f(x)  =  Y  fix%  be  a  monic  irreducible  polynomial  of 
»=o 

degree  m  over  GF(2)  so  that  GF(2m)  =  GF(2)[z]//(:c).  Let 
a  be  an  element  of  GF(2m)  and  satisfy  /(a)  =  0.  GF(2m) 
can  be  viewed  as  a  vector  space  of  dimension  m  over  GF(2) 
and  the  canonical  basis  (1,  a,  •  •*,  a771-1)  is  a  vector  A  over 
GF(2m).  Let  M  =  [Mi}j]  with 


Mi,i  =  |  /i+*+l 


0  <  i  +  j  <  m  —  1 
m  <  i  +  j  <  2m  —  2. 


(i) 


Then  B  ~  AM  is  the  vector  of  the  triangular  basis  corre¬ 
sponding  to  the  canonical  basis  [4].  Any  element  c  £GF(2m) 
can  be  written  uniquely  as  c  =  caAt  =  cbBt ,  where  ca  and 
cb  being  the  vectors  of  coordinates  of  c  with  respect  to  the 
canonical  and  triangular  bases,  respectively. 

Let  a  be  any  nonzero  element  of  GF(2m)  and  b  be  the 
inverse  of  a.  Then  it  can  be  shown  that 


m  — 1 

^  ^  +  j  —  dj,m- 1  j  —  0,  1,  •  •  •  ,  m  —  1,  (2) 

*=0 


where  Sij  is  the  Kronecker  delta  function  which  is  equal  to  1 
when  i  =  j  and  0  otherwise,  and 

(as).  i  =  0, 1,  •  •  *  ,m  —  1 

-1  (*) 
Y  sj+i-rnfj  i  =  m,  ra  +  1,  •  •  • ,  2m  —  2. 

,  i=° 

Let  h  —  b  —  am.  Then  it  can  also  be  shown  that 


I 

f  3  j + m  j  —  0,1,.“,  m  —  2 

2^  s>+i  (Mi  =  < 

i 

lx. 

(4) 

i=0  | 

1  si+rn  3=m-  1, 

where  st+m  =  sJ+m  +  1.  Now  the  shift  register  synthesis 
algorithm  of  [5]  can  be  used  to  solve  (4)  and  hence  to  compute 
the  inverse  of  a. 

While  the  coordinates  of  a  are  taken  with  respect  to  the 
triangular  basis,  those  of  b  are  obtained  with  respect  to  the 
canonical  basis.  The  use  of  these  two  bases  has  been  exploited 


to  realize  efficient  finite  field  arithmetic  operations  [4].  A  basis 
change,  if  required,  can  however  be  performed  using  simple 
linear  feed-back  and  feed- forward  shift  registers. 

The  area-time  complexity  for  the  inverter  is  0(m2  log  m). 
For  an  arbitrary  field  GF(2m)  the  inverter  has  the  least  cir¬ 
cuit  complexity  compared  to  the  recently  proposed  ones,  for 
example,  [l],  [2]  and  [3]. 
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On  the  Probability  of  Undetected  Error  and  the  Computational 
Complexity  to  Detect  an  Error  for  Iterated  Codes 
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Abstract  —  We  discuss  on  practical  and 
asymptotic  capabilities  of  iterated  codes  used  as 
error  detecting  codes.  Throughout  this  paper, 
we  assume  that  the  codes  are  the  binary  linear 
block  codes,  and  channel,  the  binary  symmetric 
channel  with  cross-over  probability  5. 

I.  Iterated  Codes 

t  et  ®  be  the  direct  product,  then  ( No,Ko )  it- 
erated  codes  c\s ^  are  constructed  by  ci  ®  c2  ® 
®  c5,  where  c\  is  the  i-th  stage  ( rii,ki )  code, 
and  integer  3  >  2.  The  method  for  detecting  any 
errors  is  the  same  method  for  correcting  any  er- 
rors  of  C\  •  The  decoding  of  the  component  code 
is  only  to  detect  any  errors.  If  all  syndrome  of  all 
component  codes  are  zeros,  the  received  sequence 
of  length  No  is  regarded  as  a  transmitted  c.ode- 
word  of  C}  }  and  is  accepted  by  the  receiver.  Un- 

(s) 

der  the  below  condition,  C\  ;  are  asymptotically 
bad  codes. 

Lemma  1  For  s  — »  00,  any  e  >  0,  and  some 
J  <  i:j.  the  necessary  and  sufficient  condition  to 

(5) 

construct  C }  ;  whose  code  rate  Ro ,  0  <  Ro  <  1 
is  given  by  \jf-  -  1|  <  c,  where  Rt  =  I]/=i  rh 

=  nLi  ri> an<^  R°  ~ 

II.  Estimation  Of  Iterated  Codes 

Definition  1  We  define  the  complexity  of  the  op¬ 
eration  required  to  detect  an  error  by  the  product 
of  the  total  number  of  shifts  and  the  number  of 
stages  of  the  shift  register  to  divide  the  polyno¬ 
mial  of  a  received  sequence. 
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tems  Engineering,  College  of  Engineering,  Hosei  University, 
3-7-2,  Kajinocho,  Koganei-slii,  Tokyo,  184  JAPAN.  E-mail 
nishi@nishi . is . hosei .ac.jp 

°S.  Hirasawa  is  with  Department  of  Industrial  Engineer¬ 
ing  and  Management,  School  of  Science  and  Engineering, 
Waseda  University,  3-4-1,  Ohkubo,  Shinjuku-ku,  Tokyo, 
169  JAPAN. 


Theorem  1  Let  be  the  complexity  of  the  op¬ 
eration  required  to  detect  an  error  for  c\sK  Then, 

«min  (No  -  Ko)<  x\s)  <  «max  (N o  ~  No),  where 
«max  =  max(n1;  n2,  •  •  •.  ns),  and  nmin  =  min(rai, 
n2,  •••,  ns). 

Corollary  1  For  C'f'1  as  0  <  Ro  <  1,  and  $  — 
oo,  we  have  0(No)  <  xT  <  0(Nq). 

Let  Pjs>(e)  be  the  probability  of  undetected  er¬ 
ror  for  Cj.  Then,  by  utilizing  the  structure  of 
fs) 

Cj  '  constructed  by  direct  product  of  s  codes  a 
whose  ni  is  very  small,  comparing  with  Aro,  we 
are  able  to  calculate  the  exact  value  of  P[S\e). 

Theorem  2  By  iterating  the  recurrent  calcula¬ 
tion  until  the  stage  s  —  1,  finally  we  can  have 
Us)(£)  =  [A/S-1)(£i-i)]Ls  -  (1  -  £5_i  )Ay  where 

1)  =  Asj 

is  the  number  of  codewords  of  Hamming  weight 
j  in  code  c5,  ss-i  is  the  average  error  probabil¬ 
ity  per  bit  at  stage  s  —  1,  ATS=  nsks-i  •  •  *&i,  and 
L s—  ks~ \hs~2m  *  *&i* 

Corollary  2  For  0  <  Ro  <  1,  and  s  — *  oo, 
P\s\eu)^  0. 

III.  Conclusion 

The  complexity  of  that  for  c\s^  is  more  simple 
than  that  for  the  conventional  single  stage  codes 
c  under  the  same  probability  of  undetected  error, 
code  length,  and  code  rate.  Also,  the  complexity 
of  that  for  c\s ^  asymptotically  is  more  simple  than 
that  for  c. 

The  exact  value  of  the  probability  of  undetected 
error  for  C\  ;  can  be  always  calculated.  Further¬ 
more,  it  is  explicitly  shown  that  the  value  of  that 
for  converges  to  zero  for  s  oo. 
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Abstract  -  A  novel  reduced-complexity  trellis 
decoding  algorithm  is  described.  The  new  algorithm, 
called  Wavefront  Decoding  (WD),  avoids  the  through¬ 
put  bottleneck  caused  by  metric  and  state-infor¬ 
mation  feedback,  which  characterizes  previously 
known  breadth-first  decoding  algorithms.  The  error 
performance  of  WD  for  trellis-coded  8PSK  on  AWGN 
and  Rayleigh  fading  channels  is  investigated  by 
simulation.  The  results  indicate  that  for  a  given 
number  of  survivor  paths,  the  performance  of  WD  is 
comparable,  although  necessarily  inferior,  to  that  of 
the  M-algorithm.  However,  in  contrast  to  the  M- 
algorithm,  WD  exhibits  a  high  degree  of  temporal 
parallelism,  rendering  it  suitable  for  high  speed 
applications. 

I.  Introduction 

The  well-known  M-algorithm  [1]  [2]  is  optimal  in  the 
sense  that  it  minimizes,  for  any  given  number  of  survivor 
paths,  the  probability  of  rejecting  the  transmitted  path  [3]. 
However,  the  M-algorithm  suffers  from  two  structural 
deficiencies.  First,  the  cost  of  survivor  selection  in  terms  of 
cycle  and  gate  count  will  always  be  high.  Second,  due  to  the 
existence  of  a  feedback  loop  in  the  decoder,  in  which  the 
metrics  and  states  of  recursion  n  are  fed  back  to  be  used  in 
recursion  n  + 1,  the  M-algorithm  is  incapable  of  simul¬ 
taneously  processing  paths  over  several  trellis  stages.  This 
excludes  the  use  of  the  M-algorithm  in  high-speed 
applications,  which  require  extensive  pipelining.  In  this 
paper,  we  show  that  by  generalizing  the  concept  of 
breadth-first  decoding,  the  feedback  loop  in  the  decoder 
may  in  fact  be  broken  up  to  support  pipelining  over  several 
trellis  stages.  Moreover,  we  find  that  for  the  decoding  of 
short  blocks,  the  survivor  selection  can  be  carried  out  at  a 
cost  significantly  lower  than  in  the  M-algorithm.  The  price 
paid  is  a  modest  deterioration  of  error  performance. 

II.  Wavefront  Decoding 

Consider  first  a  breadth-first  trellis  decoder  operating 
with  C  search  paths  selected  from  C  state-classes.  To 
proceed  forward,  the  decoder  first  stores  all  successors  of 
the  old  survivor  paths  in  C  lists  associated  with  the  C 
state-classes.  Next,  the  best  path  from  each  list  is  extracted 
to  become  a  new  survivor.  We  shall  refer  to  a  group  of  C 
paths  that  propagate  through  the  trellis  in  this  fashion  as  a 
wavefront.  Hence,  in  our  notation  the  reduced-state 
sequence  decoder  (RSSD)  considered  in  [3]  and  by  several 
other  authors  is  a  single  wavefront  decoder.  Consider  next 
a  decoder  operating  with  2C  search  paths  divided  in  two 
wavefronts,  each  one  consisting  of  C  paths.  The  two 
wavefronts  walk  in  file  through  the  trellis,  with  the  second 
one  following  immediately  behind  the  first.  To  advance 
from  time  n  to  time  n  + 1 ,  the  decoder  first  generates  and 
stores  all  successors  of  the  C  paths  in  the  first  wavefront.  C 
survivors  are  then  extracted  from  the  lists.  These  paths 
constitute  the  first  wavefront  at  time  n  +  1.  Next,  the 
successors  of  the  C  paths  in  the  second  wavefront  are 
appended  to  the  lists  and  C  additional  survivors  are 
extracted  to  become  the  second  wavefront  at  time  n  +  1. 
Notice  that  the  second  wavefront  selects  its  survivors  both 
from  its  "own"  successors  and  from  those  that  were  left 
over  by  the  first  wavefront.  By  introducing  additional 


wavefronts  in  the  same  fashion,  we  obtain  a  decoder 
which,  in  the  general  case,  operates  with  B  C  search  paths 
divided  into  B  wavefronts.  We  refer  to  this  decoding 
principle  as  Wavefront  Decoding  (WD).  Characteristic  of 
WD  is  the  fact  that  a  wavefront,  having  arrived  at  stage  n 
in  the  trellis,  may  directly  select  its  survivors  and  then 
proceed  forward  to  stage  n  +  1  without  waiting  for  the 
arrival  of  those  paths  that  follow  behind.  Hence  it  can  be 
seen  that  feedback  of  metrics  and  state-information  only 
appears  internal  to  each  wavefront.  The  processing  of  the 
wavefronts  may  now  be  pipelined  over  several  trellis 
stages  to  obtain  a  linear  speedup. 

Assuming  that  the  correct  path  starts  out  in  the  first 
wavefront,  it  will  eventually,  as  a  result  of  channel  noise, 
start  to  fall  back  in  rank,  from  the  first  wavefront  to  the 
second,  then  to  the  third  and  so  on,  until  it  reaches  the  last 
wavefront  where  ultimate  rejection  awaits.  The  only  way 
to  escape  from  a  certain  loss  of  the  correct  path  is  to 
occasionally  have  the  first  wavefront  stop  and  wait  for  the 
other  wavefronts  to  arrive.  Once  the  members  of  all  waves 
have  been  accumulated  in  the  C  lists,  B  repeated  selections 
are  made  from  each  list  to  produce  B  new  wavefronts.  The 
correct  path  now  gets  a  chance  to  recapture  its  position  in 
the  first  wave.  Obviously,  wavefront  accumulation  will 
reduce  throughput,  since  the  pipeline  is  broken  up. 
Fortunately,  it  turns  out  that  the  time  between  accumu¬ 
lations  La  can  be  made  fairly  large  without  seriously 
degrading  error  performance.  In  particular,  when  data  is 
encoded  in  short  blocks  (<100  symbols),  the  accumulation 
of  wavefronts  need  only  be  carried  out  at  the  end  of  the 
block.  Notice  that  for  the  degenerate  case  LA  =  1 ,  WD 
becomes  the  Generalized  Viterbi  Algorithm  (GVA)  [4]. 

The  error  performance  of  WD  has  been  simulated  for 
rate  2/3  trellis-coded  8PSK  on  AWGN  and  Rayleigh 
fading  channels.  In  all  cases,  C  =  4  and  LA  =  64  has  been 
used.  In  general,  it  is  observed  that  WD  exhibits  a  certain 
performance  degradation  relative  to  GVA  (with  C  =  4) 
and  the  M-algorithm  with  the  same  number  of  search  paths. 
This  is  to  be  expected,  since  the  selection  of  survivor 
combinations  in  WD  (for  EA>1  and  C>l)  is  more 
constrained  than  in  the  two  other  algorithms.  However,  in 
all  cases  considered  here,  the  degradation  is  within  a 
fraction  of  a  dB.+ 
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Abstract  —  An  error  correction  procedure  for  linear 
block  codes  is  presented  which  corrects  errors  beyond  the  half 
minimum  distance.  The  algorithm  is  based  on  minimizing  a 
real  valued  function,  called  potential.  Since  the  potential  de¬ 
creases  monotonously  with  decreasing  weight  of  the  error  vec¬ 
tor,  minimization  of  the  potential  can  be  done  by  local  search. 


I.  Introduction 

Beneath  the  well  known  algebraic  decoding  for  linear  block  codes 
there  exist  several  non-algebraic  approaches  for  error  correction. 
In  [1]  a  maximum-likelihood  algorithm  for  linear  block  codes  was 
shown  which  has  exponential  complexity.  In  [2]  the  minimum 
weight  words  are  used  as  decoding  vectors  for  binary  codes. 

The  succeeding  algorithm  is  applicable  for  all  linear  block  codes 
and  uses  a  statistical  decoding  approach  based  on  the  so  called 
"potential". 

II.  Notation 

The  N-dimensional  vector  space  over  the  q-element  Galois  field 
GF(q)  will  be  denoted  by  GF(q)N.  For  two  vectors  a  and  b  e 
GF(q)N  the  inner  product  S(a,b)  is  defmed  by  S(a,b):  =  2^aibj. 

The  code  vectors  of  the  Code  C  are  denoted  by  c,  the  error  vector 
by  e  and  the  vectors  of  the  dual  code  C'  by  c'.  Using  r  =  c  +  e  with 
r,  c,  e  e  GF(q)N  and  S(c,  c')  =  0,  the  qN'K  -  1  parity-check 
equations  are  defined  by  Aj  :=  S(r,  cj').  Furthermore  wt(a)  is  the 
Hamming  weight  of  a  e  GF(q)N  and  d  ■  the  minimum  distance. 


III.  Potential-Decoding 

A  model  is  presented  which  is  capable  of  structuring  the  Galois 
field  GF(q)N.  In  this  model  a  function  is  defined  -  called  potential 
-  which  can  be  regarded  as  a  measure  for  the  distance  of  any  vector 
to  its  nearest  code  vector.  Various  decoding  algorithms  can  be  de¬ 
rived  from  this  model.  The  potential  U(r)  of  an  arbitrary  vector  r 
is  defmed: 


qN-K_i 


U(r):=5>rlj  h 


j=l 


1,  ifAj*0 
0,  if  Aj  =0 


(1) 


L  stands  for  an  indicator  variable  for  the  parity-check  equation  A.. 
a  is  a  weighting  factor  which  depends  on  the  parity-check  vector 
c '.  The  characteristics  of  the  potential  U(r)  are: 


U(c) =  0  (2) 

U(r  ?C)>0  (3) 

U(r)  =  U(c  +  e)  =  U(e)  (4) 

U(e2)<U(e1),  if  wt(e2)  <  wt(eQ  <  dmm/2,  (5) 


Although  eq.  (5)  holds  only  up  to  d^/2  it  can  be  shown  that 
statistically  this  property  is  valid  up  to  considerably  higher  error 
numbers.  Assuming  that  ocj  €  R  is  only  dependent  on  the  weight 
L  =  wt(cp  of  the  vector  c'  gives: 

aL  :=  a(Cj'  I  wt(Cj')  =  L).  (6) 

With  this  assumption  U(r)  is  separable  into  subpotentials  UL(r). 
Every  subpotential  UL(r)  consists  of  the  mL  parity-check  vectors  of 
weight  L. 

UL(r)=aL^Ijl  U(r)  =  JuL(r).  (7) 

j=l  L=0 


It  can  be  shown  [3],  that  the  mean  value  of  UL  is  given  by: 


UL(e|wt(e)  =  t)  =mL^p 


1- 


\1_q-7 


(8) 


For  efficient  decoding  it  is  not  necessary  to  use  all  qN‘K  -  1  parity- 
check  equations.  Table  1  shows  the  decoding  performance  for  se¬ 
veral  codes  using  only  the  two  subpotentials  UL  with  the  maxi¬ 
mum  and  minimum  weight  vectors. 


Error 
Number  t 

(31,11,11) 

BCH-code 

(63,24,15) 

BCH-code 

(113,57,15) 

QR-code 

<5 

0% 

0% 

0% 

6 

33.3  % 

0% 

0% 

7 

87.3  % 

0% 

0% 

8 

_ 

1.5% 

0.06  % 

9 

- 

9.5  % 

0.6  % 

10 

36.5  % 

1.8% 

11 

- 

72.2  % 

12.2  % 

12 

- 

95.0  % 

29.2  % 

Table  1:  Percentage  of  decoding  errors  of  weight  t. 

Figure  1  shows  the  performance  of  decoding  with  the  subpotential 
U98  for  a  (1 13,57,15)  QR-code  compared  with  Bounded  Minimum 
Distance  (BMD)  decoding  and  with  a  rate  1/2  convolutional  code 
(K=7)  with  Viterbi  decoding.  Potential-decoding  is  very  well 
suited  to  implementations  into  VLSI.  To  reach  the  decoding  per¬ 
formance  of  U98,  only  50  000  gates  of  an  ASIC  are  necessary,  up 
to  data  rates  of  approximately  10  Mbit/s. 


Figure  1  Bit  error  probability  of  U9g  for  the  (1 13,57,15)  QR-code 
on  an  AWGN-channel  with  BPSK-modulation. 
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Abstract  —  General  decoding  algorithms  for  lin¬ 
ear  codes  that  have  less  complexity  than  exponen¬ 
tial  search  have  been  studied  by  many  researchers 
and  exact  complexities  are  known  for  the  memoryless 
channel  [1-4] .  Among  the  various  decoding  strategies 
for  linear  codes,  the  information  set  decoding  algo¬ 
rithm  has  complexity  that  is  significantly  lower  than 
that  for  most  other  general  algorithms  over  most  code 
rates  [3,4].  It  is  the  purpose  of  this  paper  to  derive 
the  complexity  for  information  set  decoding  used  in 
channels  where  errors  may  occur  in  bursts,  and  to 
quantify  the  gain  in  complexity  over  the  memoryless 
channel  case. 

I.  Introduction 

Errors  encountered  in  many  communication  channels  are 
not  independent  but  appear  in  bursts.  One  way  to  effectively 
model  bursty  channels  is  to  assume  that  the  channels  have  two 
states  with  different  probabilities  of  channel  error  [5].  The 
channels  we  consider  have  probability  7T5  to  be  in  the  good 
state  and  probability  7r&(=  1  —  7 r5)  to  be  in  the  bad  state.  The 
error  probability  for  the  good  state  is  assumed  to  be  r  times 
the  error  probability  for  the  bad  state,  where  0  <  r  <  1.  The 
Gilbert-Elliott  channels  are  described  by  a  two-state  Markov 
chain  model,  and  state  transitions  depend  on  the  transition 
probabilities.  We  define  the  complexity  exponent  F(R)  of  de¬ 
coding  algorithms  for  binary  linear  codes  of  rate  R  as- 

F(R)  =  lim  —  \og2M(nyR) 

TL  — *•  CO  n 

where  M (n,  R)  is  the  number  of  computations  necessary. 

II.  Information  Set  Decoding  on  Bursty 
Channels 

For  the  bursty  channel  with  deterministic  state  transitions, 
the  complexity  exponent  Fd(R)  of  the  information  set  decod¬ 
ing  that  gives  error  probability  no  greater  than  twice  the  error 
probability  of  maximum  likelihood  decoding  is  given  by 

Fd(R)  =  (1  -  R)  -  (1  -  R)H 
when  R  >  7T5(1  —  r),  and 

Fd(R)  =  *9H(rp)  -  -  R)H 

otherwise,  where  #(•)  is  the  binary  entropy  function  and  p  is 
the  value  satisfying 

1  —  R  =  7 VbH(p)  +  7 VgH(rp). 

xThis  work  was  supported  in  part  by  NSF  Grant  NCR-9115969 


The  obtained  complexity  is  shown  to  be  strictly  less  than  the 
complexity  exponent  for  the  memoryless  channel  for  the  en¬ 
tire  range  of  code  rates  and  channel  parameters  7Ti>,7rg,  and 
r.  When  r  =  1,  Fd(R)  becomes  identical  to  the  complexity 
exponent  for  the  memoryless  channel.  The  gain  in  complexity 
gets  larger  as  r  gets  closer  to  0,  i.e.,  when  the  channel  error 
probabilities  for  two  states  differ  by  a  larger  amount.  The 
optimal  way  to  select  information  sets  is  to  choose  finR  bad 
state  symbols  and  (1  —  f3)nR  good  state  symbols,  where  j3  is 
given  by 

0  =  7Tfc(-R~  7Tg(l  ~  R)) 

(7 tb  +  Kgr)R 

when  R  >  irg(l  -  r),  and  /3  =  0  otherwise.  Bounds  on  the 
complexity  exponent  Fge(R)  for  Gilbert-Elliott  channels  can 
be  achieved  by  modifying  the  result  for  the  channel  with  de¬ 
terministic  state  transitions.  We  obtain 

Fd(R)  <  Fge{R)  <  Fd(R)  +  A {b,  g) 

where  b  is  the  transition  probability  from  the  good  state  to  the 
bad  state,  g  is  the  transition  probability  from  the  bad  state  to 
the  good  state,  and  A (b,  g)  =  H (-j^)  + ^ (g) >  The  bounds 
become  tight  when  the  channel  transitions  take  place  slowly; 
we  have  A (6,  g)  «  0  for  small  b  and  g. 

It  is  possible  to  improve  the  bounds  for  Fge{R)  when  side 
information  such  as  soft-decision  information  is  available  to 
the  decoder.  The  extra  complexity  in  the  upper  bound  on 
Fge(R) ,  when  compared  to  Fd(R ),  is  due  to  the  state  esti¬ 
mation  of  the  received  sequences.  By  using  soft-decision  infor¬ 
mation,  we  can  effectively  estimate  which  symbols  are  trans¬ 
mitted  through  either  the  good  state  or  the  bad  state.  One 
scheme  for  state  sequence  estimation  is  to  choose  the  nirg  most 
reliable  symbols  out  of  a  given  sequence  of  n  symbols,  and  as¬ 
sume  that  these  are  the  symbols  that  have  been  transmitted 
through  the  good  state.  For  the  Gilbert-Elliott  channel  with 
soft-decision  information  available,  we  achieve  the  complexity 
exponents  very  close  to  Fd(R)  even  when  state  transitions 
occur  frequently. 
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Abstract  —  Many  algorithms  for  soft  and  near-soft 
decision  decoding  of  block  codes  start  by  implement¬ 
ing  hard  decision  decoding.  In  several  instances  it 
has  been  noted  that  simple  tests  of  the  hard  decision 
result  may  allow  the  algorithm  to  terminate  at  this 
point.  This  paper  explores  this  notion  in  detail. 

I.  Introduction 

Consider  the  problem  of  decoding  an  (n,  k,  dmin)  binary  block 
code  with  codewords  c j.  Assume  that  antipodal  signaling, 
s j  =  \/E(2cj  -  1),  and  additive  Gaussian  noise  (zero  mean, 
variance  a2)  produce  the  channel  observation 

x  —  s j  4-  n. 

To  minimize  the  probability  of  error  the  two  standard  de¬ 
coding  techniques  are  soft  decision  and  hard  decision  decod¬ 
ing  (with  resulting  codewords  cs  and  c h  and  performances 
P e (soft)  and  Pe(haxd),  respectively).  Soft  decision  decod¬ 
ing,  while  providing  optimum  performance,  is  computation¬ 
ally  burdensome.  Hard  decision  decoding  has  a  significantly 
reduced  implementation  complexity  at  reduced  performance. 
During  the  last  30  years  many  authors  have  searched  the  mid¬ 
dle  ground  for  high  performance,  low  complexity  approaches. 

Many  of  these  approaches  start  with  hard  decision  decod¬ 
ing,  searching  the  nearby  codespace  for  a  best  choice  of  code¬ 
word.  It  has  been  noted  that  such  algorithms  can  terminate 
early  if  the  data  x  and  the  hard-decision  result  Ch  together 
satisfy  certain  conditions.  We  envision,  then,  a  decoder  with 
operation: 

1.  Hard-decision  decoding  is  implemented  yielding  c*. 

2.  A  test  is  performed  to  see  c h  matches  c3  (without,  of 
course,  directly  finding  c5).  If  the  answer  is  yes,  the 
decoding  algorithm  terminates  at  this  point. 

3.  If  the  test  of  step  2  fails,  full  soft  decision  decoding  or 
some  other  strategy  is  implemented. 

Without  actually  implementing  soft  decision  decoding,  the 
test  in  step  2  has  three  possible  answers:  yes ,  no,  and  the 
data  is  inconclusive .  A  “yes”  response  is  called  a  success  for 
the  test;  conversely,  a  “no”  or  “data  inconclusive”  response 
is  a  failure  in  that  additional  processing  would  be  required 
before  decoding  is  complete. 

The  motivation  for  such  tests  is  that  since  hard  decision 
decoding  is  correct  a  relatively  high  percentage  of  the  time,  it 
often  matches  the  soft  decision  decoding  result  exactly.  This 
idea  can  be  made  more  mathematically  formal.  Specifically, 
it  can  be  shown  that 

Pe (hard)  —  Pe(soft)  <  Pr  (ch  ^  cs)  <  Pe(hard)  +  2Pe(soft) 

Since  Pe(soft)  is  typically  much  smaller  than  Pe(hard),  then 
Pr  (ch,  ^  cs)  ^  Pe(hard).  Thus,  the  failure  probability  for  any 
test  for  step  2  is  approximately  lower  bounded  by  Pe(hard). 
An  efficient  test  should  fail  only  about  as  frequently  as  hard 
decision  decoding  makes  an  error. 


As  an  example  of  a  test  of  c h  —  cs,  consider  the  following 
well  known  condition: 

The  Codeword  Test  -  If  the  hard  decision  decoder’s  input 
is  already  a  codeword,  then  c h  =  c3 . 

Unfortunately,  this  result  is  far  from  the  lower  bound  on  the 
failure  probability.  Several  tests  for  step  2  are  described  be¬ 
low  with  emphasis  on  the  coherent  Gaussian  channel.  Ad¬ 
ditional  details,  including  tight  upper  and  lower  bounds  to 
performance  for  these  tests,  are  presented  in  [3], 

II.  Tests  for  the  AWGN  Channel 

The  first  test  has  been  mentioned  previously  [2]: 

The  Hypersphere  Test  -  7/x  is  within  V dminE  units  (in 
Euclidean  distance)  of  the  hard  decision  decoded  signal  then 

C  h  Cs  • 

Realizing  that  the  actual  soft  decision  decoding  region  is  a 
convex  cone,  the  test  region  can  be  expanded  from  a  hyper¬ 
sphere  to  the  circumscribing  right  circular  cone: 

The  Circular  Cone  Test  -  If  x  satisfies 
X(2c^  l)  ^  j  71  dmin 

VnxxT  ~  V  n 

then  x  falls  within  the  aforementioned  cone  and  c h  —  cs. 

While  the  cone  test  completely  encloses  the  hypersphere  test, 
the  cone  and  codeword  tests  do  have  different  support;  hence, 
it  seems  reasonable  to  combine  them: 

The  Combined  Test  -  If  the  hard  decision  decoder’s  input 
is  already  a  codeword  or  if  the  received  vector  x  satisfies  the 
cone  inequality  in  (1)  then  c h  =  cs. 

Algebraic  analysis  of  the  soft  decision  decoding  operation  [1] 
yields  a  further  test: 

The  Polygonal  Cone  Test  -  Define  Zi,  i  =  1, 2, . . ,  n,  by 

_  (  -\~Xi  if  Ch,i  =  0 
~  \  -Xi  if  Ch,i  =  1 

If  the  sum  of  the  dmin  largest  zi  does  not  exceed  zero  then 
c  h  —  cs . 

This  set  subsumes  ail  of  the  above  tests  with  some  increase  in 
complexity.  The  resulting  performance  can  be  quite  good. 
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Abstract  —  In  this  paper,  different  results  related  to 
the  ordering  of  a  sequence  of  N  received  symbols  with 
respect  to  their  reliability  measure  are  presented  for 
BPSK  transmission  over  the  AWGN  channel. 

I.  Approximation  of  Pe(?7i,  •  •  • ,  rij\  N) 

For  BPSK  transmission  over  the  AWGN  channel,  many  max¬ 
imum  likelihood  decoding  (MLD)  algorithms  of  binary  lin¬ 
ear  block  codes  first  reoder  the  received  symbols  within  each 
block  with  respect  to  their  reliability.  In  [1],  the  statistics  of 
the  noise  after  ordering  are  derived.  These  statistics  allow  to 
tightly  bound  the  error  performance  of  any  suboptimum  algo¬ 
rithm  based  on  reordering. 

After  ordering  a  sequence  of  N  symbols,  the  probability 
Pe(ni,  ♦  ♦  • ,  7ij  \  N)  that  an  error  occurs  at  positions  ni,  •  •  • ,  n3 
can  be  computed  exactly.  However,  no  close  form  solution 
has  been  found  for  N  >  3.  This  is  mostly  due  to  the  fact  the 
the  noise  for  which  the  statistics  are  derived  is  not  the  ordered 
random  variable.  Based  on  the  central  limit  theorem,  we  show 
in  this  paper  that  for  N  large  enough,  the  distribution  of  Wt, 
the  restriction  of  the  ith  ordered  noise  value  to  the  interval 
[l,oo),  is  well  approximated  by  the  distribution  of  a  normal 
random  variable  that  we  specify.  This  approximation  leads  to 

Pe(i;  N)  =  e  No  ,  (1) 

where  mt  =  a-1  (1  -  i/TV),  after  defining,  for  n  >  1, 

a(n)  =  Q( 2  —  n)  —  Q(n),  with  the  normalization  Q(z)  = 
(ttNq)-1/2  e~n2fN°dn.  When  N  is  large  enough,  Equa¬ 
tion  1  provides  a  tight  bound. 

If  Wt  and  Wj  represent  the  ith  and  jth  ordered  noise  val¬ 
ues,  it  is  possible  to  show  that  W3  \Wl  has  the  density  func¬ 
tion  of  the  (j  —  i)th  noise  value  after  ordering  a  sample  of 
size  N  —  i  from  a  population  with  distribution  truncated  to 
the  interval  [m(wi),  where  m{wi)  =  min(2  —  wt,Wi) 

and  M(wi)  =  max(2  —  Wi,u>i).  Combining  this  result  with 
Equation  1,  we  show  that,  for  i  <  N , 

Pe(i,  i;  N)  =  Pe(*;JV)  Pe(>;*).  (2) 

Generalizing  Equation  2  to  any  ordered  set  of  indices  Ij  = 
{th,  ■  •  ■ ,  rij}  corresponding  to  positions  in  error  after  ordering, 
we  compute,  based  on  a  chain  argument, 

j  —  i 

Pe(«i,  •••,»>;  JV)  =  U  Pe(»i;  JV)-Pe(n,-;  N).  (3) 

1=1 

Therefore,  despite  the  fact  that  the  random  variables  repre¬ 
senting  the  noise  after  ordering  are  dependent,  their  associated 
error  probabilities  tends  to  behave  as  if  they  were  independent, 
for  N  »  rij-i  and  large  enough. 


II.  First  Order  Approximation  of  the  Ordered 
Binary  Symmetric  Channel 
The  value  Pe(ni ,  •  ■  • ,  rij\  N)  represents  the  probability  that  at 
least  the  bits  in  position  m,  *  •  •  ,  n3  are  in  error  after  ordering 
a  sequence  of  length  N .  We  now  also  define  Pe;v(ni ,  •  •  • ,  n3) 
as  the  probability  that  only  the  bits  at  position  ni, 
are  in  error  after  ordering  a  sequence  of  length  N .  While 
Pe(ni ,  ■  •  • ,  rij ;  N)  is  computed  by  integrating  the  joint  dis¬ 
tribution  of  the  rij  ordered  random  variables  Wni ,  ■  •  • ,  Wnj , 
the  computation  of  Pe^(?ii ,  •  •  ■ ,  ra,-)  requires  to  integrate 
the  joint  distribution  of  the  N  ordered  random  variables 
Wij  ■  •  ■ ,  Wk.  It  follows  that  the  discrete  time  channel  model 
after  ordering  is  a  2N-state  BSC  with  transition  probabilities 
Pejv(ni ,  •  •  • ,  Uj)' s.  We  refer  this  channel  as  the  Ordered  BSC 
(OBSC).  Based  on  Equation  3,  we  approximate 

3 

Pe(?n,  •  -  =  jQPe(ni;Ar),  (4) 

l 

which  expresses  that  after  ordering,  the  events  of  having  er¬ 
rors  at  positions  remain  independent.  Therefore, 

the  2N-state  fully  connected  OBSC  is  equivalent  to  N  time- 
shared  BSC’s  corresponding  to  each  ordered  position.  We 
name  this  approximation  the  first  order  approximation 
of  the  OBSC. 

The  capacity  of  the  OBSC  C^.ave  requires  the  computation 
of  2n  N- order  integrals  and  rapidly  becomes  too  complex  to 
evaluate  as  N  increases.  In  contrast,  the  capacity  of  the  first 
order  approximation  of  the  OBSC 

N 

CN,aVe  =  1  -JjYl  A(Pe(*;  AT))  bit  (5) 

1=1 

is  easily  derived.  For  N  =  1,  C\tavc  is  simply  the  ca¬ 
pacity  of  the  BSC  with  crossover  probability  Q(l),  while 
limjv^oo  CN,ave  should  provide  the  capacity  Cbpsk  of  the  con¬ 
tinuous  Gaussian  channel  for  BPSK  transmission.  We  observe 
that  CN,avc  ~  CN,ave  and  that  the  convergence  to  this  limit 
is  very  fast  as  N  increases,  so  that 

ChT,ave  ~  C/y,at/e  ~  Gbpski  (6) 

for  N  large  enough.  Equation  6  indicates  that  when  con¬ 
sidering  an  ordered  sequence  of  sufficiently  long  A,  the  first 
order  approximation  of  the  OBSC  should  provide  a  good  ap¬ 
proximation  of  the  continuous  Gaussian  channel,  for  BPSK 
transmission.  Therefore,  for  a  given  SNR,  knowing  the  posi¬ 
tion  in  the  ordering  instead  of  the  exact  received  value  should 
be  sufficient  from  a  performance  point  of  view. 
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Abstract  —  This  paper  analyzes  the  number  of 
computation  steps  on  a  binary  tree  searching  fast 
for  one  in  some  beforehand- given  points  that  is 
the  nearest  to  a  query  point  in  a  Hamming  space. 

I.  Introduction 

{0,  l}1  denotes  the  whole  set  of  binary  sequences  (called 
points)  of  a  length  /  >  2.  We  measure  the  distance  be¬ 
tween  points  by  the  Hamming  distance  normalized  by  /. 

Suppose  that  n  >  2  arbitrary  points  X\,  •  ■  • ,  xn  (called 
samples)  in  {0,  l}1  are  given,  where  duplications  are  al¬ 
lowed.  We  consider  arranging  the  samples  into  a  binary 
tree  and,  over  it,  searching  fast  for  some  x  E  (aq,  *  •  • ,  xn} 
that  is  the  nearest  to  any  querried  point  x  E  {0, 1}*. 

The  authors’  last  paper  [2]  mentioned  a  KM  tree  [l] 
that  could  search  for  the  nearest  point  fast  but  the  search 
time  was  neither  clear  in  theoretical  nor  in  experimental. 
The  present  paper  evaluates  theoretically  the  number  of 
computation  steps  required  for  the  nearest  point  search 
over  an  alternative  tree. 

II.  Tree  Construction 

Fix  a  real  constant  7  >  0  called  a  stopping  threshold. 
Given  an  arbitrary  sequence  x  =  x\- "Xn  of  n  samples, 
the  following  procedure  constructs  a  binary  tree  T7(x) 
each  leaf  of  which  stores  at  most  771  samples. 

Procedure  1  (tree  construction  procedure): 

Step  1:  Construct  a  tree  comprising  only  a  root  that  stores 
x.  (Regard  this  root  also  as  a  leaf  to  start  (a)-(b).) 

Step  2:  While  the  present  tree  has  at  least  one  leaf  N 
storing  a  sequence  z  =  zv  •  -z \z\  of  points  such  that  |z|  > 
772  and  all  of  2i,***,2|z|  are  the  same,  do  (a)-(b). 
Otherwise,  answer  the  present  tree. 

(a)  Let  c  =  Z\.  Discover  one  of  r- values  that  make  |zl| 
and  |zr|  as  equal  as  possible,  where  zl  and  zr  denote  the 
sequences  composed  of  £ts  respectively  for  which  d(c,  &)  < 
r  and  d(c, £;)  >  r. 

(b)  Store  (c,  r)  on  N.  Store  zL  and  zR  respectively  on  the 
left  and  right  child  nodes  of  N.  Next,  remove  z  from  N.  □ 

III.  The  Nearest  Point  Search 
Let  an  arbitrary  subtree  T*  of  T7(x)  whose  vertex  is  a 
nonleaf  or  leaf  node  of  T7(x)  be  given  with  a  supposition 
that  the  total  length  of  zs  stored  on  all  leaves  of  T7(x)  is 
<  1/7.  Let  <S  denote  the  set  of  all  points  stored  on  leaves 
of  T*.  Fixed  a  real  constant  A  >  0  called  pre-bounding 
parameter,  the  following  recursive  procedure  /a(T*,  x)  for 
an  arbitrary  query  point  x  E  {0,  l}*  tries  to  answer  one  of 
points  ys  in  <5  that  achieve  d(x,  y)  =  min^s  d{x,  y)  <  A. 

Procedure  2  (search  procedure  fa(T*,x)): 

Step  1:  If  T*  is  a  minimal  treej  then  do  Step  3,  else  do  2. 
Step  2:  For  the  pair  (c,  r)  of  point  and  nonnegative  real 
stored  on  the  root  N  of  T*,  execute  one  of  (a)-(c). 


(a)  In  case  of  d(c,  x)  <  r- A,  compute  £l  =  /a(^Lj  x)  an<f 
answer  yi,  as  the  output  of  /a(T*,: r),  where  denotes 
the  subtree  of  T*  whose  vertex  is  the  left  child  node  of  N. 

(b)  In  case  of  d(c,x)  >  r+A,  compute  £r  =  /a(Tr,x) 
and  answer  £r. 

(c)  Otherwise,  compute  both  of  t/l  and  £r.  If  d(x^yi)  < 
d(a;,7/R),  then  answer  y p,  else  answer  t/r. 

Step  3:  Now  T*  coincides  with  a  leaf  of  T7(x)  that  stores 
a  Unite  sequence  z  =  z\-"Z\z\  of  points  (Note  that  z\  — 
••*  =  2|z|  provided  that  7  <  1  fn).  Answer  z\,  □ 

The  branching  into  (a)-(c)  based  on  the  triangle  in¬ 
equality  cuts  off  wasteful  traversal  over  T7(x)  efficiently. 
The  nearest  point  is  searchable  by  initializing  T*  as  T7(x) 
provided  that  7  <  l/n. 


IV.  Computation  Steps  for  a  Point  Search 
Lemma  1:  Selected  n  i.i.d. samples  ffi,  ••*,#»  in  {0,  l}?, 
the  depth  of  T7(x)  is  almost  surely  <  log2(l/7)  if  ^ 
n  are  sufficiently  large.  □ 

For  each  nonleaf  node  N  of  T7(x),  let  C?n  (called  a 
gray  zone)  denote  the  set  of  all  query  points  that  activate 
Step  2(c)  in  Proc.  2,  i.e.,  =  {x\  x  €  {0,  l}*,  r-A  < 

d(c,x)  <  r+A},  where  (c,  r)  is  one  stored  on  N. 

Lemma  2:  Fix  an  arbitrary  real  constant  7y>0.  Se¬ 
lected  a  query  point  x  uniformly  in  {0,  l}*,  the  probability 
that  x  may  belong  to  on  condition  that  a  node  pointer 
latches  a  nonleaf  node  N  of  T7(x)  at  Step  2  in  Proc.  2  is 
<  7]  if  /  is  sufficiently  large.  □ 

In  applying  Proc.  2  on  T7(x),  let  m7(x)  denote 

Ef  the  number  of  the  latched  nodes 
l  for  a  query  point  x 
*e{o,iVv 


)•**(*).  a) 


where  P(x)  —  l/|{0,  l}7 1  =  2“/  Var  E  (0,  l}?.  We  can 
regard  this  /i7(x)  as  the  mean  number  of  the  latched  nodes 
in  once  application  of  Proc.  2. 

Lemma  3:  Selected  n  i.i.d.  samples  2:1,  •••,#„  in 
{0,1}',  +y(x)  is  almost  surely  <  log2(l/7)  T  2  +  1  / 77  if 
/  is  sufficiently  large.  □ 

Corollary  3:  /x7(x)  with  7  <  l/n  is  almost  surely  of 
O(logTi)  if  /  is  sufficiently  large.  □ 

Thus,  the  mean  number  of  computation  steps  of  Proc.  2 
is  almost  surely  of  0(log72)  if  /  is  sufficiently  large. 
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Abstract—  We  obtain  the  upper  bound  on  the 
probability  of  undetected  error  which  is  valid 
uniformly  on  choosing  the  probability  of  the 
symbol  invertion.  This  bound  is  better  than 
previous  known  bounds 

Let  F£  -  Hamming  space  of  binary  sequences  of  length  n 
with  metric  d(x^y)  =  ]Cr=i  I  ~  W  I \X>V  ^  F% *  Let 

Cn(y,r)  =  E*€F,»,d(x)V)=r  *-  sPhere  of  r  with 

center  in  y  €  F?.  For  arbitrary  linear  code  Ank  C  F* 

of  dimension  k(\  Ank  |=  2*)  define  the  set  Ax  =  {ATz\r  = 
0, 1, . . . n},  x  €  Ank  where  A\  =|  Ankf)Cn(x,r)  \  - 
numder  of  vectors  from  Ank  which  distance  from  x  €  Ank 
is  equel  to  r.  It  is  easy  to  see  that  AZ)  ATX  does  not  depends 
on  x  €  An k  so  we  ommit  the  index  x  in  the  notations 
Ax,  Arz*  The  set  A  is  called  the  spectrum  of  the  code  Ank> 
For  arbitrary  p  €  [0>  1]  the  probability  of  undetected  error 
Pue(p,  Ank)  is  defined  by  the  equality 

n 

P*e(p,Allk)  «*  5Zi4fpf(l 

r—l 

We  are  interesting  in  the  value 

P(ny  k)  =  min  max  P%e(py  Ank)- 
^n*CF2"p€[o,i] 

It  is  easy  to  show  that 

)  2* 

so 

P{n)  k )  >  2*“tt  —  2“n. 

The  best  known  upper  bound  on  P(n,  k)  which  is  valid 
for  all  ft,  k  was  obtained  in  [1]  and  is  the  follows 

P(n,k)  <  Ciy/rL2k~n 

where  C\  is  constant  (C*i  <  >/r/2(l  +  o(l)),  %  — ►  oo). 
This  bound  was  obtained  by  the  estimation  of  the  RHS 
of  the  following  inequality  offered  earlier  in  [2] 


P(fa)<2fc-*£c; 

T=1 


Here  we  present  the  result  which  is  the  statement  of  the 
following  theorem. 

Theorem  1  For  some,  constant  and  for  all  nyk  the 
following  estimation  is  valid 

P(n , k)  <  (C2\A nn  +  1)2*-“. 


During  the  proof  of  this  theorem  we  show  that  at  least 
for  sufficiently  large  n  the  estimation  C2  <  2/y/*is  valid, 
but  it  can  be  improved  by  the  more  precise  calculations. 

To  prove  this  theorem  we  divide  the  spectrum  A  into 
six  parts  and  prove  the  existence  of  the  code  Ank  which 
spectrum  satisfying  the  following  relations 


53  >lrpr(l =  0; 

r=l 


£  A'fll 


1  -  P)—’  <  (l  + 1^)  2‘— , 

r=»,  x  ' 


TZ rtl  — *2+1 


—ft, 

> 


£  ATpT(l  —  p)*-r  —  0 

r=ti— *1  4*1 


for  some  1  <  si  <  *3  <  n/2.  Note  that  if  k  >  In2  n  for 
some  constant  C4  >  0  then  using  the  same  arguments  as 
in  the  proof  of  the  theorem  it  is  easy  to  prove  the  more 
strong  upper  bound  for  P(nt  k ): 

p(»#  k)  <  erf-* 


where  C$  >  0  come  constant.  We  conjecture  that  in 
order  to  prove  the  last  estimation  for  all  values  of  k  and 
n  it  is  necessary  to  use  some  additional  nonprobabilistic 
arguments. 
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Abstract  —  This  contribution  is  concerned  with  bit 
parallel  inverters  over  finite  fields.  Two  alternative 
approaches  for  inversion  with  low  complexity  will  be 
reviewed.  Both  methods  are  based  on  multiple  field 
extension  of  GF( 2).  It  will  be  shown  that  one  architec¬ 
ture  is  a  generalization  of  the  other’s  architecture  core 
algorithm.  As  an  impressive  example,  the  complexity 
of  an  inverter  in  the  field  GF( 28)  will  be  computed. 

I.  Inversion  in  Extension  Fields  of  Degree  Two 
The  first  architecture  was  proposed  in  [2]  in  1989  and  rein¬ 
troduced  in  [3]  in  1991.  The  core  part  of  the  architecture  is 
the  following.  Let  us  consider  an  element  A  =  ao  +  a\x  from 
GF((2k /2)2)?  where  ao,ai  G  GF{2kf2).  There  exists  always  a 
field  polynomial  of  the  form  P(x)  =  x2  +  x  +  po,  where  po  6 
GF( 2*/2).  If  the  inverse  is  denoted  as  B  =  A-1  =  bo-\-b\x ,  the 
equation  A  ■  B  =  [aobo  +  poaibi]  +  [ao&i  +  ai&o  +  a\b\]x  —  1 
must  be  satisfied,  which  is  equivalent  to  a  set  of  two  linear 
equations  in  bo,bi  over  GF(2k t2)  whose  solution  is: 

,  where  A  =  ao(ao  +  ai)  +  pofli-  (1) 

The  advantage  of  this  algorithm  is  that  all  operations  are  per¬ 
formed  in  GF( 2kf2).  The  algorithm  can  be  applied  recursively. 

II.  Inversion  in  Composite  Fields 
The  second  architecture  was  proposed  in  the  last  section  of 
Itoh-Tsujii’s  paper  from  1988  [1,  Section  6].  It  is  based 
on  so-called  composite  fields  which  are  finite  fields  with  two 
extensions  GF(( 2n)m).  We  start  with  the  trivial  notation 
A-1  =  (Ar)~1Ar~1 .  If  the  auxiliary  parameter  r  is  defined  as 

r  :=  =  1  +  2n  H - f  we  obtain  the  important 

property:  Ar  G  GF( 2n),  VA  G  GF((2n)m).  We  are  now  able 
to  state  a  four  step  algorithm  for  computing  the  inverse  of  A: 
Step  1  Compute  A7’"1 
Step  2  Compute  AT~lA  —  Ar 

Step  3  Compute  (Ar)“1  =  A~r  (Inversion  in  GF(2n)) 

Step  4  Compute  A~r  Ar~1  =  A-1 


III.  A  Relation  Between  the  Architectures 

For  the  development  of  a  relation  between  the  two  architec¬ 
tures,  we  consider  [1]  with  composite  fields  GF(( 2n)2)  and 
P(x)  —  x2  +  x  -bpo-  An  arbitrary  field  element  is  represented 
by  A(x)  =  aix  +  ao,  its  inverse  by  B  :=  A"1  =  b\x  +  bo.  The 
parameter  r  is  now  r  —  2n  -1-  1.  By  denoting  xr~x  =  six  +  so, 
Step  1  of  the  algorithm  is:  A7’"1  =  [aisi]x  -P  [aiso  4-  ao].  The 
computation  in  Step  2  is:  Ar  =  [aoSi  +  aiso  +  ao  +  aisijaix  + 
[aoaiso  +  ao  +  cl\siPo\.  Since  Ar  is  an  element  of  the  subfield 
its  coefficient  at  x  is  zero,  and  thus  aiso+ao  =  (ao+ai)si.  In¬ 
serting  this  relation  in  the  expressions  for  Ar~l  and  Ar  yields: 


B(x)  =  Ar 


aix  +  (ai  +  ao) 
ao(ai  T  ao)  +  a^po 


1The  research  was  done  while  the  author  was  with  the  Institute 
for  Experimental  Mathematics,  University  of  Essen,  Germany. 


Equation  (2)  is  the  same  as  the  Equations  (1).  [1]  can  thus  be 
viewed  as  a  generalization  of  the  core  algorithm  of  [2].  [1]  is, 
however,  not  a  generalization  of  the  architecture  of  [2],  since 
the  latter  allows  multiple  field  extensions  of  degree  two. 

IV.  Efficient  Bit  Parallel  Inversion  in  GF{ 28) 
For  the  application  of  the  architecture  [2]  the  decomposition 

of  GF( 28)  into  GF(( 24)2)  is  considered.  Let  Q(y)  =  y4  +  p  + 1 
be  the  primitive  polynomial  generating  GF{ 24)  with  Q{oj)  =  0 
and  P{x)  —  x2  -Fx  -j-u;14  the  primitive  polynomial  generating 
the  composite  field.  For  computing  Equations  (1)  in  hardware, 
the  following  GF( 24)  arithmetic  modules  must  be  provided: 

•  A  direct  approach  allows  inversion  with  not  more  than 
15  XOR/IO  AND  gates  [4,  Appendix  A]. 

•  Three  multiplications  require  45XOR/48  AND  [5]. 

•  The  two  additions  require  2  •  4  =  8  XOR  gates. 

•  Constant  multiplication  with  cj14  requires  1  XOR  gate. 

•  Squaring  of  an  element  requires  2  XOR  gates. 

The  resulting  over-all  gate  count  of  71  XOR/58  AND  is  re¬ 
markably  low.  It  is  interesting  to  compare  this  complexity 
with  bit  parallel  multiplication.  For  instance,  the  multiplier 
[5]  has  a  gate  count  of  84  XOR/64  AND. 

V.  Conclusions  and  Further  Research 
Decomposition  of  Galois  fields  GF(  2;  can  lead  to  area- 

efficient  inverters.  In  general,  this  approach  seems  promising 
since  multipliers  over  composite  fields  can  also  be  realized  ef¬ 
ficiently  [3]  [6].  For  certain  fields,  in  particular  for  GF( 2s), 
and  inverter  can  be  realized  with  a  gate  count  smaller  than 
that  of  a  multiplier.  This  result  is  contrary  to  common  belief. 

For  technical  applications  it  will  be  helpful  to  provide  gen¬ 
erators  x2-hx+po  for  tower  fields  with  multiple  field  extensions 
of  degree  two.  Lists  with  irreducible  polynomials  over  non¬ 
prime  fields  are  very  rare  in  literature.  The  zero  coefficients 
po  of  these  polynomials  should  be  optimized. 
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Abstract  -  Simulation  results  for  concatenated  outer  Reed- 
Solomon  and  inner  convolutional  codes  used  in  multilevel 
schemes  are  presented.  Different  high-rate  inner  convolutional 
codes  are  considered,  viz.,  punctured  codes  and  partial  unit 
memory  (PUM)  codes.  Best  results  are  obtained  for  PUM  codes, 
since  they  have  a  better  extended  row  distance  profile.  The  effect 
of  channel  and  block  interleaving  at  the  different  levels  is  also 
studied,  and  iterative  decoding  is  tried1. 

I.  Introduction 

A  multilevel  code  uses  some  signal  set  S0  which  is  a  finite  subset 
of  a  lattice  or  a  set  of  points  with  some  group  structure.  This  set  is 
partitioned  into  a  k- level  partitioning  chain,  S0/S{/ .../Sk,  which 
can  be  described  as  a  rooted  tree  with  k  +  1  levels  (the  root  is  level 
zero).  Every  node  at  level  i  is  partitioned  into  disjoint  subsets  which 
are  cosets.  Each  partition  at  level  i,  ,  is  determined  by  a 

component  code  C. .  In  general  these  component  codes  may  be  of 
any  type,  but  for  this  work  we  have  only  considered  convolutional 
component  codes  and  concatenated  component  codes  with  inner 
convolutional  and  outer  Reed-Solomon  codes.  Using  multilevel 
codes  one  can  achieve  arbitrarily  large  squared  Euclidean  free  dis¬ 
tance. 

The  structural  properties  of  multilevel  codes  make  them  attrac¬ 
tive  for  code  constructions.  Unfortunately,  the  decoding  will  be  car¬ 
ried  out  in  a  way  which  is  not  maximum  likelihood,  otherwise  the 
computational  efforts  become  far  too  large  even  for  small  systems 
(i.e.,  systems  with  not  very  complex  component  codes).  The  compu¬ 
tational  complexity  of  the  preferred  multistaged  decoding  procedure 
from  [1]  is  proportional  to  the  sum  of  the  complexities  of  each  com¬ 
ponent  code,  but  it  suffers  from  error  propagation.  In  order  to  mini¬ 
mize  the  errors  at  each  level,  a  concatenated  scheme  with  outer 
Reed-Solomon  and  inner  convolutional  codes  was  considered.  The 
errors  of  the  inner  convolutional  decoders  occur  in  bursts,  and  the 
idea  is  that  the  inherent  burst  error  correcting  capability  of  the  outer 
RS  code  will  correct  these  errors. 

Our  system  transmits  signals  over  the  AWGN  channel.  The  used 
signal  constellation  is  8-PSK.  This  implies  three  levels  in  the  sys¬ 
tem.  Since  the  partition  chain  is  8-PSK/4-PSK/2-PSK/1-PSK,  the 
minimum  squared  Euclidean  distance  among  the  signal  points  in  the 
subsets  at  the  different  levels  increases  for  each  partition.  Therefore 
the  encoder  of  level  1  must  be  protected  by  a  more  powerful  code 
than  that  of  level  2,  et  cetera . 

II.  Simulation  Results 

The  simulations  show  that  there  is  no  need  for  a  concatenated 
code  at  level  3.  In  order  to  retain  as  high  overall  rate  as  possible,  the 
rate  of  the  inner  code  at  level  2  must  be  quite  large.  Due  to  its  simple 
decoder  implementation,  a  punctured  convolutional  (PC)  code  was 
tested.  Simulations  then  show  that  the  bit  error  rate  (BER)  perfor¬ 
mance  of  level  2  bounds  the  overall  code  BER.  This  is  caused  by  the 
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bad  extended  row  distance  profile  of  punctured  codes,  i.e.,  error  vec¬ 
tors  e  of  small  weight  are  enough  to  result  in  quite  long  bursts.  As  an 
alternative,  a  PUM  code  was  tested.  There  exist  decoding  proce¬ 
dures  for  these  codes  [3]  that  are  not  more  complex  than  decoding 
of  PC  codes.  The  simulations  show  that  a  PUM  code  with  overall 
constraint  length  one  less  than  the  previously  used  PC  code,  per¬ 
formed  only  negligible  worse  (<  0.05  dB).  From  a  practical  point  of 
view  the  reduced  decoding  complexity  is  far  more  important. 

We  need  to  minimize  error  propagation  at  each  level  (between 
the  inner  and  outer  code)  as  well  as  the  error  propagation  between 
levels.  The  first  is  accomplished  by  reducing  the  length  of  error 
events  from  the  inner  decoder  by  applying  block  interleaving.  The 
simulations  confirmed  a  theoretical  result  from  [2]  on  how  many 
rows  the  interleaver  matrix  need  in  order  to  maximize  the  free  dis¬ 
tance  of  the  concatenated  system.  One  idea  how  to  decrease  the 
error  propagation  between  levels  is  to  interleave  the  coset  labels  of 
different  levels  in  time  (channel  interleaving).  However,  this  seems 
to  be  of  little  help.  Comparing  simulations  of  our  system  without 
channel  interleaving  with  simulations  of  a  theoretical  system  with¬ 
out  any  error  propagation  at  all  (a  genie  between  every  level),  shows 
a  difference  of  less  than  0.05  dB  already  at  a  BER  of  10-4 . 

Finally  we  studied  iterative  decoding  and  its  influence  at  the  dif¬ 
ferent  levels.  There  is  no  immediate  way  of  extracting  the  error 
probability  of  individual  decoded  bits  because  of  the  hard  decoding 
of  the  RS  codes.  It  turns  out  that  only  the  first  level  benefits  from 
‘hard’  iterative  decoding.  The  improvement  on  the  whole  system  is 
only  marginal  (the  asymptotic  error  performance  follows  level  2 
instead  of  level  1).  The  total  BER  is  not  changed  more  than  a  few 
tenths  of  a  dB. 


Bit  error  probabilities 


E/N0( dB) 


Fig.  1.  Simulation  results  for  the  concatenated  system  vs.  uncoded  4-PSK. 
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Abstract  -A  simple  method  is  presented  for  performing  soft-de¬ 
cision  demodulation  and  decoding  of  trellis-coded  phase-dif¬ 
ference  modulations.  Results  are  given  for  trellis-coded  M-ary 
differential  phase-shift  keying  and  M-ary  double-differential 
phase-shift  keying,  each  with  soft-decision  demodulation  and 
decoding.  The  performances  of  these  combinations  of  coding, 
modulation,  demodulation,  and  decoding  are  presented  for 
channels  which  may  introduce  a  phase  ramp  in  the  modulated 
signal. 


Summary 

Phase-difference  modulation,  such  as  M-ary  differential  phase- 
shift  keying  (M-DPSK)  and  M-ary  double-differential  phase-shift 
keying  (M-D2PSK)  [1],  is  desirable  for  some  mobile  radio  systems 
and  channels  in  which  it  is  difficult  to  obtain  an  accurate  phase 
reference.  Either  M-DPSK  or  M-D^PSK  may  be  coupled  with  a 
trellis  code  to  decrease  the  probability  of  bit  error  for  a  given  sig- 
nal-to  noise  ratio  (SNR).  As  the  rate  of  the  code  is  decreased,  the 
number  M  of  points  in  the  M-ary  PSK  (M-PSK)  signal  constella¬ 
tion  must  be  increased  in  order  to  transmit  the  same  rate  of  infor¬ 
mation  in  the  same  bandwidth.  As  M  is  increased,  the  probability 
of  symbol  error  increases,  even  for  a  channel  with  perfect  phase 
stability.  However,  even  greater  degradation  results  if  there  is  Dop¬ 
pler  shift  in  the  channel  or  phase  drift  in  the  system's  oscillators.  It 
is  therefore  of  interest  to  investigate  modulation  and  coding  sys¬ 
tems  that  can  tolerate  such  a  phase  variation  in  the  carrier  signal. 
To  avoid  trivialities,  it  is  assumed  in  all  that  follows  that  M>4. 

Trellis  coding  provides  coding  gain  to  offset  the  increase  in 
symbol  error  probability  that  results  from  increasing  M.  Optimal 
trellis  demodulation  and  decoding  may  be  too  complex  to  imple¬ 
ment  in  a  mobile  radio  system.  An  alternative  method  which  per¬ 
forms  nearly  as  well  and  is  much  less  complex  is  to  perform  the 
demodulation  and  decoding  separately.  For  example,  it  has  been 
suggested  that  the  pragmatic  trellis  code  be  demodulated  in  this 
way,  with  hard  or  soft  bit  decisions  at  the  output  of  the  demodula¬ 
tor  being  input  to  a  convolutional  decoder  modified  to  correct  par¬ 
allel  branch  errors  [2]. 

The  decision  regions  for  standard  hard-decision  demodula¬ 
tion  of  M-PSK  signals  correspond  to  equal-length  intervals  for  the 
phase  of  the  received  signal.  As  a  consequence,  standard  hard- 
decision  demodulation  is  easy  to  implement,  but  it  does  not  pro¬ 
vide  information  on  the  relative  reliabilities  of  the  bit  decisions 
that  result  from  a  symbol  decision.  Because  some  bit  decisions  are 
more  reliable  than  others,  soft-decision  demodulation  and  decod¬ 
ing  should  be  employed. 

The  natural  generalization  of  the  standard  method  for  soft- 
decision  demodulation  and  decoding  of  binary  signals  (e.g.,  binary 
PSK)  is  not  effective  in  M-PSK  demodulation,  in  part  because  the 
reliabilities  of  the  bit  decisions  do  not  depend  only  on  the  received 
signal  strength.  The  optimum  method  for  soft-decision  decoding 
for  a  channel  with  perfect  phase  stability  is  too  complex  for  most 
applications;  in  particular,  it  requires  an  accurate  measurement  of 
the  SNR  in  the  front  end  of  the  receiver.  In  addition,  this  method 
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ITT  Aerospace  and  Communications  Division.  John  M.  Shea  is  the  recipi¬ 
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may  perform  very  poorly  if  there  is  any  phase  drift  in  the  carrier. 

We  propose  a  suboptimal  method  to  generate  quantized  soft 
information  for  each  bit  associated  with  an  M-PSK  symbol.  This 
method  exploits  the  way  bits  are  assigned  to  symbols  in  the 
M-PSK  constellation,  and  it  is  simple  to  implement  in  the  last  stage 
of  the  demodulator.  Simulation  results  show  that  the  proposed 
method  provides  a  significant  performance  improvement  over  hard- 
decision  demodulation  and  decoding.  The  method  is  based  on 
dividing  each  hard-decision  phase  interval  into  subintervals,  using 
phase  as  the  only  criterion.  The  weights  for  the  individual  bits  are 
constant  throughout  each  subinterval,  but  they  vary  among  the  sub¬ 
intervals,  even  within  the  same  hard-decision  interval.  The  length 
of  the  subintervals  can  be  adjusted  to  optimize  performance. 

A  simulation  was  employed  to  obtain  numerical  values  for 
the  additional  coding  gain  for  soft-decision  decoding  over  hard- 
decision  decoding.  The  bit  error  probability  is  shown  in  Figure  1 
as  a  function  of  Eb/No,  the  energy  per  information  bit  divided  by 
the  one-sided  spectral  density  of  the  white  Gaussian  noise.  The 
dashed  curves  illustrate  that  the  simple  two-bit  quantized  soft-de¬ 
cision  decoding  scheme  for  8-DPSK  with  the  rate  2/3  pragmatic 
trellis  code  provides  up  to  1 .5  dB  additional  coding  gain  over  the 
hard-decision  system  on  the  additive  white  Gaussian  noise  channel 
with  a  stable  phase.  The  solid  curves  show  that  the  simple  two-bit 
quantized  soft-decision  decoding  scheme  with  8-DPSK  performs 
up  to  3.5  dB  better  than  the  hard-decision  system  for  a  system  with 
a  10  degree  phase  rotation.  The  phase  rotation  is  defined  as  the 
phase  change  over  the  duration  of  one  M-ary  symbol  due  to  a  lin¬ 
ear  phase  drift  in  the  carrier.  The  two-bit  soft-decision  decoding 
scheme  used  with  M-D2PSK  provides  up  to  2.2  dB  coding  gain 
over  hard-decision  decoding  for  channels  with  stable  phase  and 
channels  with  phase  ramps. 
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Figure  1.  Comparison  of  hard-  and  soft-decision  decoding 
for  two  channels 
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Abstract  —  Punctured  convolutional  codes  allow  an 
easy  implementation  of  variable-rate  encoders/decoders.  In 
this  paper,  the  puncturing  technique  is  used  to  generate  new 
QAM  trellis  codes  from  a  rate- 1/2  code.  These  codes  are  true 
high-rate  codes,  without  parallel  branches  in  the  trellis.  A 
simplified  decoding  technique  is  also  presented.  It  is  shown 
that  the  advantages  die  puncturing  technique  provides  with 
binary  convolutional  codes  are  essentially  maintained  with 
Trellis-Coded  Modulation. 

Summary 

Trellis-Coded  Moduladon  (TCM)  can  yield  significant 
coding  gains  of  3  to  6  dB  over  uncoded  modulation  without 
bandwidth  expansion  [1],  Unfortunately  with  Ungerboeck’s 
usual  TCM,  each  signal  constellation  requires  a  different 
code.  For  example,  a  code  for  8-PSK  is  different  from  a 
16-PSK  code.  As  a  consequence,  implementing  a  system 
with  various  spectral  efficiencies  (e.g.,  2,  3  and  4  bits/s/Hz) 
would  necessitate  several  distinct  encoders/decoders.  In 
addition,  since  there  are  2m  branches  converging  onto  each 
trellis  state  for  a  rate  R=m/(m+l)  TCM! code,  decoding 
such  a  code  with  the  Viterbi  algorithm  requires  (2m-l) 
binary  comparisons  per  state.  Hence,  Viterbi  decoding 
in  the  usual  manner  becomes  quickly  impractical  as  the 
number  of  states  and  the  coding  rate  increase.  A  pragmatic 
approach  to  this  problem  has  been  proposed  by  using  a  rate- 
1/2,  64-state  convolutional  code  and  adding  (m-1)  uncoded 
bits  to  the  output  to  produce  a  rate  R=m/(m+l)  code  [2, 
3].  The  disadvantage  of  this  approach  is  that  the  trellis 
exhibits  parallel  branches.  For  some  codes,  limiting  the  free 
distance  to  the  distance  between  parallel  branches  leads  to 
suboptimality. 

It  has  been  shown  that  the  puncturing  technique  can 
be  applied  to  TCM  [4].  Using  extensive  computer  searches, 
8-PSK  and  16-PSK  punctured  codes  have  been  found  with 
free  squared  Euclidean  distances  that  are  either  equal  to  or 
almost  as  large  as  the  distances  of  the  best  known  codes 
discovered  by  Ungerboeck.  The  puncturing  technique  can 
also  provide  codes  with  uncoded  input  bits  and  parallel 
branches  in  the  trellis.  Furthermore,  variable-rate  punctured 
TCM  codes  have  also  been  found  using  computer  search. 
Families  of  QPSK,  8-PSK  and  16-PSK  codes,  which  are 
quite  good  in  the  sense  of  Euclidean  distance  as  compared  to 
the  best  known  codes,  have  been  obtained  from  a  single  rate- 
1/2  convolutional  code  and  a  varying  puncturing  pattern  [5]. 
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The  advantage  of  using  a  single  rate- 1/2  code  is  that  variable 
bandwidth  efficiencies  and  hence,  variable  throughputs  can 
be  achieved  with  a  single  encoder/decoder. 

The  puncturing  technique  presented  here  is  quite  flex¬ 
ible,  allowing  either  a  true  high-rate  code  or  a  code  with 
parallel  branches.  In  this  paper,  new  8-QAM,  16-QAM  and 
32-QAM  punctured  trellis  codes  are  presented.  These  codes 
are  true  high-rate  codes  without  parallel  branches.  The  free 
Euclidean  distance  is  not  limited  by  the  distance  between 
parallel  branches  and  hence,  when  the  number  of  states  is 
large,  these  codes  can  provide  a  larger  free  distance  than 
codes  with  parallel  branches.  Furthermore,  over  Rayleigh 
fading  channels,  the  absence  of  parallel  branches  in  the 
trellis  is  beneficial  since  codes  without  parallel  branches 
yield  a  better  error  performance  than  codes  having  parallel 
branches. 

By  using  the  fact  that  these  QAM  codes  are  generated 
from  a  rate- 1/2  code,  decoding  can  be  performed  on  the 
low-rate  trellis.  Hence,  the  reduction  in  the  number  of 
binary  comparisons  the  puncturing  technique  provides  with 
convolutional  codes  is  essentially  maintained  with  TCM  at 
the  cost  of  a  slight  degradation  in  the  error  performance. 
These  decoding  techniques  and  simulations  results  will  be 
presented. 
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Abstract  —  New  design  rules  for  multilevel  codes 
with  finite  codeword  length  are  derived  from  infor¬ 
mation  theory  leading  to  digital  transmission  schemes 
with  high  power  and  bandwidth  efficiency. 


I.  Introduction 

Multilevel  coding  (MLC)  is  a  well  known  approach  to  cre¬ 
ate  power  and  bandwidth  efficient  communication  schemes. 
Usually,  the  component  codes  are  designed  for  balanced  Eu¬ 
clidean  distance  for  all  levels,  see  e.g.  [2].  But  this  rule  does 
not  take  into  account  the  tremendously  increasing  number  of 
nearest  neighbour  error  events  for  low  levels  due  to  the  mul¬ 
tiple  representation  of  code  symbols  by  signal  points,  cf.  [3]. 
Thus,  in  multistage  decoding  a  predomination  of  errors  in  low 
levels  can  be  observed  which  leads  to  a  serious  degradation  in 
power  efficiency.  Therefore,  we  propose  to  design  the  compo¬ 
nent  codes  using  parameters  from  information  theory  of  the 
equivalent  channels  at  the  individual  levels. 


II.  Multilevel  Coding 

MLC  for  a  M  =  2^-ary  digital  modulation  scheme  is  based 
on  a  binary  set  partitioning  of  the  signal  constellation  A  = 
{am\m  G  {0,1,...,  M  —  1 } }  defining  a  mapping  m  <-►  c  of 
binary  labels  c  =  (c°,  c1, . . . ,  c1-1)  to  the  signal  points  am. 
The  subsets  of  signal  points  at  level  i  are  denoted  by  the  path 
to  the  subsets  in  the  set  partitioning  tree,  i.e. 

Aco.  ct  =  {am\m  (c°, . . . ,  c,  zt+1, . . . ,  x £~1),  x3  G  {0,  1}}. 

At  each  level  i  equivalent  channels  can  be  considered  for 
the  transmission  of  binary  symbols  c\  The  sum  of  capacities 
Cx  of  these  equivalent  channels  yields  the  capacity  C  of  the 
communication  scheme  ([4],  [3]).  Consequently,  we  proposed 
to  choose  the  rates  Rl  of  long  codes  at  levels  i  equal  to  the 
capacities  Cl  [3], 


III.  Rate  Design  for  Finite  Blocklength 

The  blocklength  of  MLC  schemes  is  limited  due  to  restric¬ 
tions  like  delay  or  decoder  complexity.  Therefore,  a  design 
rule  for  MLC  with  finite  and  uniform  length  n  of  the  compo¬ 
nent  codes  at  each  level  is  presented  in  this  paper. 

The  tool  to  consider  codes  with  finite  length  n  is  the  random 
coding  bound 

Pe  <  2~n  E^R\  (1) 

where  pe  denotes  the  probability  of  block  errors  and  Er(R) 
the  random  coding  exponent. 

For  transmission  of  a  symbol  cl  at  level  i  in  a  MLC  scheme  a 
point  of  the  subset  Aco  ci  is  selected  equiprobably.  Thus,  the 
probability  density  function  (pdf)  of  the  continuous  channel 
output  y  for  given  c*  reads 


fy{y\c)  ■ 


^  '  fy{y\am), 
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where  the  conditional  pdf’s  fy(y\am)  characterize  the  discrete 
memoryless  channel.  From  this  equation,  the  random  coding 
exponents  Er(Rl)  for  the  equivalent  channels  at  levels  i  of  a 
MLC  scheme  can  be  calculated  in  a  straightforward  way. 

A  suitable  representation  of  the  random  coding  bound  for 
the  rate  design  are  isoquants 

El(R')  =  — '  =  const.  V  a2 ,  (3) 

where  a2  denotes  the  noise  variance  per  dimension.  We  pro¬ 
pose  the  design  rule: 

For  a  maximum  tolerable  block  error  rate  pe  and  given 
codeword  length  n  at  all  levels,  choose  the  rates  Rl 
of  a  MLC  scheme  from  the  corresponding  isoquants  of 
the  random  coding  exponents  Elr(Rl)  for  given  noise 
variance  a2  or  given  total  rate  R  =  Rl . 

IV.  Simulation  Results 

Simulation  results  for  digital  PAM  transmission  with  MLC 
over  the  AWGN  channel  are  presented.  Turbo  codes  [l]  with 
rates  designed  from  random  coding  bound  are  employed  as 
component  codes.  For  16QAM  with  total  rate  R  =  3  and 
blocklength  n  =  2000  a  bit  error  rate  (BER)  <  10~5  is 
achieved  only  1.4  dB  above  capacity  limit.  For  n  =  20000, 
BER  <  10“5  only  0.8  dB  above  capacity  has  been  observed. 
For  8PSK  with  total  rate  R  =  2,  simulation  results  are  simi¬ 
lar.  The  results  for  16QAM  can  be  extended  to  M  >  16-ary 
QAM  schemes  by  imposing  further  uncoded  levels.  Further¬ 
more,  these  uncoded  levels  can  be  employed  to  achieve  an 
additional  shaping  gain. 

V.  Conclusion 

The  benefits  of  powerful  binary  codes  can  be  transferred 
to  any  digital  transmission  scheme  via  the  multilevel  coding 
approach,  if  the  individual  rates  are  well  chosen,  e.g.  accor¬ 
ding  to  the  random  coding  bound  criterion  for  the  individual 
levels.  Application  of  Turbo  codes  to  MLC  schemes  offers  di¬ 
gital  communication  close  to  capacity  limit  for  a  wide  range 
of  trading  power  for  bandwidth  efficiency. 
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Abstract  -  Trellis  coding  of  Gaussian  minimum  shift  keying  (GMSK) 
is  considered.  The  structure  of  combinations  of  rate  1/2  and  2/3  binary 
convolutional  encoders  and  GMSK  modulation  with  several  values  of 
the  parameter  BT  is  studied  by  means  of  the  so  called  "matched  coding 
approach"  [4,5].  It  is  shown  that  in  such  connections  up  to  3  distinct 
classes  of  codes  can  be  identified  each  with  different  receiver 
complexity.  The  results  of  the  optimization  procedure  for  codes 
combined  with  GMSK  are  given.  The  results  show  that  significant 
coding  gains  (over  6.5  dB)  are  obtained.  Power-bandwidth  performance 
of  the  best  coded  schemes  is  presented  where  it  is  demonstrated  that 
variation  of  BT  offers  another  degree  of  freedom  in  the  design  of 
communication  systems. 


I.  Introduction 


Demand  for  spectral ly-efficient  modulation  techniques  for  use  in  various 
communication  systems  and  the  inherent  properties  make  Gaussian 
minimum  shift  keying  (GMSK)  [1]  an  attractive  scheme  for  prospective 
applications.  In  recent  years,  trellis  coding  of  modulations  with  memory  has 
gained  much  attention  since  it  usually  offers  significant  coding  gains  and 
hence,  improved  power  efficiency  what  is  especially  important  in  power- 
limited  systems  [2,  3].  In  this  paper,  we  study  application  of  trellis  coding 
technique  to  GMSK  schemes  with  selected  values  of  the  normalized 
bandwidth  of  the  premodulation  filter  BT.  The  first  objective  is  to  analyze 
how  convolutional  codes  interact  with  the  memory  of  the  GMSK  modulator 
and  how  it  influences  the  trellis  of  the  combined  receiver  for  the  coded 
scheme.  We  also  give  quantitative  results  of  coding  gains  over  the  uncoded 
signals  that  can  be  achieved  due  to  trellis  coding.  Finally,  wc  present  the 
performance  of  the  best  coded  GMSK  schemes  in  terms  of  power- 
bandwidth  tradeoffs  and  compare  them  to  other  binary  systems. 

The  considered  system  consists  of  a  convolutional  encoder  followed  by  a 
GMSK  modulator,  AWGN  channel  and  the  optimum  Viterbi  receiver  which 
uses  a  combined  encoder-modulator  trellis  for  joint  demodulation  and 
decoding.  The  GMSK  signal  is  a  constant  envelope  RF  phase-modulated 
signal  where  the  information  carrying  phase  is  given  by: 
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where  is  the  transmitted  symbol,  and  g(t)  is  the  frequency  impulse  of  the 
form: 
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The  values  of  L  which  determine  the  duration  of  the  impulse  g(t)  depend  on 
the  particular  GMSK  scheme.  For  a  finite  length  LT  of  g(t)  a  modulator  can 
be  represented  as  a  finite-state  sequential  machine.  Following  the  approach 
of  [4],  a  precoder  T(D)=\+D  was  used  in  our  system  which  precodes  the 
input  to  the  modulator  making  it  a  feedback-free  one. 


II.  Convolutional  codes  combined  with  GMSK 

We  consider  combinations  of  noncatastrophic  convolutional  codes  of 
rates  1/2  and  2/3  and  precoded  GMSK  modulators.  We  assume  that  when 
concatenating  convolutional  encoders  with  modulators  the  initial  state  of 
both  circuits  is  a  zero  state.  Let  Sq  denote  the  number  of  an  encoder  states 
and  Sy  the  number  of  states  in  the  combined  Viterbi  receiver.  The 
following  lemmas  can  be  formulated  for  these  schemes. 

Lemma  1:  For  the  GMSK,  BT=  0.5  and  BT=  0.4  modulators  combined  with 
the  rate  1/2  and  rate  2/3  convolutional  codes  and  for  every  5^4,  there  are 
exactly  two  distinct  classes  of  codes  (A  and  B)  producing  the  required  value 
of  Sy,  namely: 

A:  SG=l/4  Sy  (3) 

B:  SG=\/2Sy  (4) 


Lemma  2:  For  the  GMSK,  BT=03  and  BT=  0.25  modulators  combined 
with  the  rate  1/2  convolutional  codes  and  for  every  Sy> 4,  there  is  exactly 
one  class  of  codes  (A)  producing  the  required  value  of5y,  namely: 

A:  %=l/4Sy  (5) 

Lemma  3:  For  the  GMSK,  BT=03  and  BT=  0.25  modulators  combined 
with  the  rate  2/3  convolutional  codes  and  for  every  ,5^8,  there  are  exactly 
three  distinct  classes  of  codes  (A,  B  and  C)  producing  the  required  value  of 
Sy,  namely: 


A: 

SG~  1/8  Sy 

(6) 

B: 

S^lMSy 

(7) 

C: 

SG=z\/2  Sy 

(8) 

Codes  of  (4),  (5)  and  (8)  are  called  matched  codes  (encoders)  [5]  for  the 
respective  GMSK  modulators.  The  remaining  codes  are  mismatched  ones. 

III.  Numerical  results 

A  systematic  search  for  best  matched  and  mismatched  short  convolutional 
codes  maximizing  minimum  squared  Euclidean  distance  of  the  coded 
GMSK  schemes  has  been  performed.  Table  1  contains  the  distances  of  the 
best  connections  of  GMSK  signals  and  rate  1/2  codes.  All  schemes 
presented  in  the  table  were  obtained  using  matched  codes.  The  results  show 
that  matched  codes  usually  outperform  mismatched  codes  by  0.5  to  1  dB. 
Coding  gains  over  uncoded  signals  range  from  1.3  to  6.6  dB  for  all 
considered  GMSK  signals  and  code  rates,  increasing  with  the  receiver 
complexity. 

The  comparison  of  the  coded  GMSK  with  other  binary  systems  has  been 
done  in  terms  of  the  power-bandwidth  performance.  In  particular,  it  turned 
out  that  best  rate-2/3  coded  GMSK  with  BT=  0.5  found  by  us  perform  nearly 
the  same  as  rate- 1/2  coded  TFM  schemes  of  [4]  for  Viterbi  receivers  with 
more  than  16  states. 

Table  1 


Normalized  minimum  squared  Euclidean  distances  of  the  best  rate- 1/2 
coded  GMSK  schemes  with  optimum  receivers  of  up  to  128  states. 


BT&\ 

4 

8 

16 

32 

64 

128 

0.5 

3.00 

4.00 

5.91 

5.97 

7.91 

8.87 

0.4 

3.00 

4.00 

5.83 

5.95 

7.83 

8.77 

0.3 

1.12 

3.00 

4.88 

5.77 

6.77 

7.67 

0.25 

1.19 

3.00 

4.82 

5.64 

6.64 

7.52 

0.2 

— 

3.00 

3.92 

5.19 

5.68 

7.04 

0.15 

1.56 

3.02 

4.56 

5.07 

6.10 
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Abstract  —  It  is  shown  that  the  use  of  Gray  scram¬ 
blers  and  Gray  mapped  signal  sets  are  equivalent.  A 
search  is  performed  for  better  scramblers,  including 
a  search  for  scramblers  with  memory.  Memoryless 
scramblers  are  found  to  give  best  performance  and  an 
explanation  for  this  is  given. 

I.  Introduction 

Recent  authors  have  suggested  ways  in  which  the  BER  of 
trellis  codes  can  be  reduced.  In  [2]  the  scrambling  of  the  infor¬ 
mation  bits  with  a  Gray  coder  prior  to  encoding  is  discussed, 
while  in  [3]  and  [4]  the  use  of  a  Gray  coded  signal  set  mapper 
is  examined.  We  will  show  that  these  two  methods  are  equiva¬ 
lent.  We  also  present  a  systematic  technique  based  on  bounds 
for  Pe  and  P&  for  finding  the  best  scrambler  to  be  used  with  a 
given  trellis  code.  This  search  is  not  limited  to  combinatoric 
circuits,  we  also  search  for  scramblers  with  memory. 

II.  Algebraic  Relation  Between  Gray  Coded 

Scrambler  and  Signal  Mapper 

The  Gray  coded  8-PSK  signal  set  mapper  used  in  [3]  can 
be  represented  as  a  naturally  mapped  8-PSK  signal  set  map¬ 
per  preceded  by  an  n  x  n  matrix  transformation  G.  The  Gray 
coded  scrambler  considered  in  [2]  precedes  the  generator  ma¬ 
trix  and  is  represented  by  the  k  X  k  matrix  transformation 
S.  In  general,  the  algebraic  relation  between  an  8-PSK  trellis 
code  with  a  Gray  scrambler  and  a  natural  signal  mapper,  and 
an  equivalent  8-PSK  trellis  code  based  on  a  Gray  coded  signal 
mapper  is 

SGn  =  GgC  (1) 

where  Gn  and  Gg  are  the  generator  matrices  for  the  code  with 
the  naturally  mapped  signal  set  and  the  code  with  the  gray 
coded  signal  set,  respectively.  This  relationship  does  not  hold 
between  all  the  8-PSK  codes  in  [1]  and  [3],  because  in  [3]  the 
authors  have  found  codes  with  a  better  Pe  than  those  in  [l]. 
However,  it  is  possible  to  use  (1)  to  transform  the  codes  of  [3] 
to  equivalent  naturally  mapped  codes  which  will  have  a  better 
Pe  than  the  Ungerboeck  codes.  Preceded  by  a  Gray  scrambler 
the  BER  performance  of  the  new  code  will  be  identical  to  the 
Gray  mapped  code. 

III.  Search  Method 

The  union  bound  on  Pb  is  used  as  a  cost  function  to  choose 
the  best  scrambler,  so  that  the  effect  of  the  scrambler  on  an 
error  path  is  weighted  by  its  probability.  Consider  the  effect 
of  some  scrambler  s(-)  on  a  sequence  of  correct  data  c;  the 
input  to  the  encoder  will  be  s(c),  and  if  an  error  e  occurs  the 
output  of  the  decoder  will  be  s(c)  -4-  e,  and  the  output  of  the 
descrambler  will  be  s“1(5(c)  -j- e).  If  the  scrambler  is  linear  we 
have 

s-1  (5(c)  +  e)  =  5-1  (5(c))  +  5-1  (e)  =  c  +  s"1  (e)  (2) 

so  the  scrambling  does  not  affect  the  correct  path.  Thus  we 
wish  to  find  a  scrambler  s  which  minimises 

A  =  5^W(a_1(«i))Pr(£<)  (3) 


where  e  is  a  subset  of  the  set  e  of  all  error  paths,  consisting  of 
only  the  error  paths  which  have  a  significant  effect  on  the  cost 
function  A*  W(-)  is  the  Hamming  weight  of  the  error  path. 

For  any  code  G  there  exists  an  equivalent  systematic  en¬ 
coder  matrix  Gsys  such  that  Gsys  —  TG.  Gsys  has  a  trivial 
right-inverse  of  degree  0  whereas  G  generally  does  not.  This 
means  that  the  error  paths  produced  by  Gjgs  will  have  lower 
degree  than  those  produced  by  G”1,  hence,  while  scramblers 
Si  and  S2  give  identical  performance  with  generator  matrices 
Gsys  and  G',  respectively,  scrambler  Si  will  have  lower  degree 
than  52.  Thus  the  search  for  the  best  scrambler  for  the  code 
generated  by  G  should  involve  first  finding  the  error  paths  for 
GSys .  The  best  scrambler  5  for  Gsys  can  then  be  found,  and 
the  best  scrambler  for  G  will  then  be  ST. 

IV.  Search  Results 

A  search  was  performed  for  the  best  scrambler  for  v  — 
3  systematic  Ungerboeck  codes  with  k  varying  from  2  to  5. 
In  all  cases  a  memoryless  scrambler  was  found  to  give  best 
performance.  The  reason  for  this  can  be  seen  if  we  look  at 
a  list  of  error  vectors  ordered  according  to  probability.  It  is 
clear  that  the  best  memoryless  scrambler  found  will  reduce  the 
Hamming  weight  of  all  vectors,  producing  an  almost  ideal  list, 
i.e.,  vectors  with  high  probability  have  low  Hamming  weight 
and  vice  versa.  To  get  further  improvement  we  must  permute 
a  small  number  of  vectors,  leaving  most  fixed.  However  for  a 
fc-dimensional  vector  space  there  are  at  most  k  invariant  sub¬ 
spaces,  so  it  is  clear  that  we  cannot  change  a  small  number  of 
vectors.  If  we  were  to  use  a  nonlinear  scrambler  we  could  do 
this,  but  then  (2)  would  not  hold. 

The  best  scrambler  in  all  cases  was  found  to  reduce  the 
BER  by  approximately  1/3.  This  gain  is  only  significant  in 
applications  where  the  gradient  of  the  BER  curve  is  small, 
such  as  low  Eb/No  operating  points  or  on  fading  channels. 
For  example,  the  Eb/No  required  to  achieve  a  BER  of  10~2 
with  the  1/  =  3  8-PSK  Ungerboeck  trellis  code  is  reduced  by 
0.25  dB  when  a  scrambler  is  used. 
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Abstract  —  In  this  paper  we  focus  on  the  issue  of  dis¬ 
tribution  of  dimensions  in  time,  describing  a  method, 
suited  to  different  types  of  envelope  functions  used 
for  modulation,  that  can  generate  an  almost  arbitrary 
distribution  of  dimensions  in  time  with  spectral  ef¬ 
ficiencies  near  the  Nyquist  limit.  Subsequently,  we 
propose  a  modulation  scheme  whereby  the  distribu¬ 
tion  of  dimensions  in  time  is  used  to  carry  additional 
information  in  the  same  Bandwidth  (BW). 

I.  Introduction 

The  dimensionality  theorem  states  that  using  shift  orthog¬ 
onal  functions  for  modulation  with  shift  period  A  and  band¬ 
width  B,  in  a  T  seconds  interval  we  can  generate  at  most  2 BT 
dimensions  [l].  We  are  interested  in  how  these  dimensions  are 
distributed  in  time.  Classically  we  have  had  two  options:  (1)  if 
A  =  T,  the  best  basis  functions  to  use  are  prolate  spheroidal 
wave  functions;  (2)  if  A  <<  T,  it  is  natural  to  use  shift  or¬ 
thogonal  functions  such  as  the  raised  cosine  shaping  pulses,  or 
the  recently  proposed  scaling  functions  and  wavelets  [2].  The 
purpose  of  this  paper  is  to  present  systematic  methods  based 
on  the  theory  of  wavelets  and  filter  banks  to  generate  almost 
arbitrary  distributions  of  dimensions  in  time,  achieving  the 
highest  spectral  efficiency  in  a  given  BW. 

II.  Generation  of  Distribution  of  Dimensions 

We  describe  here  the  basic  steps  of  a  procedure  for  the  gen¬ 
eration  of  distribution  of  dimensions.  In  the  proposed  method 
we  use  two  shift  orthogonal  frequency  overlapping  functions, 
q(t)  and  w(t),  where  q(t)  is  a  lowpass  function  while  w(t)  is 
a  bandpass  function.  Both  q(t)  and  tu(<)  are  shift  orthogonal 
with  period  A.  q(t)  can  be  either  a  scaling  function  [2]  or  an 
even  or  odd  shift  orthogonal  function  [3].  w(t)  will  be,  respec¬ 
tively,  the  function  w(t)  =  \/2q(t)sin(2Trt/ A)  or  the  wavelet 
associated  with  the  scaling  function  q(t). 

Step  1:  the  overlap  space  between  q(t)  and  w(t)  is  isolated 
by  filtering  the  portion  of  w(t)  that  falls  on  the  BW  of  q(t). 
For  this  purpose,  either  wavelet  packets  or  nearly  ideal  low 
pass  filters  can  be  used,  depending  on  the  characteristics  of 
the  modulation  waveforms.  This  operation  generates  a  func¬ 
tion  o(f)  which  is  shift  orthogonal  with  shift  period  L A  (L  is 
an  integer),  spanning  a  space  occupying  the  same  BW  as  q(t) 
yet  completely  orthogonal  to  it.  This  function  can  be  used  to 
generate  additional  dimensions  in  the  same  BW  as  q(t). 

Step  2:  the  space  spanned  by  q(t)  can  be  split  into  orthogonal 
frequency  channels  using  the  combination  of  wavelet  packets 
and  multiplicity- M  wavelets.  The  overlap  space  spanned  by 
o(f)  can  be  similarly  partitioned.  This  orthogonal  frequency 
channelization  can  be  extremely  flexible  [2].  These  results 
are  subsequently  used  to  introduce  a  novel  coded  modulation 
scheme  based  on  the  concept  that  the  way  the  time- frequency 
plane  is  partitioned  into  orthogonal  frequency  channels  can 
carry  information. 


III.  Application  to  Coded  Modulation 

Suppose  we  have  a  two-state  modulator  which  can  choose 
between  the  shift  orthogonal  function  <j>(t)  with  shift  period  A 
(state  {To )  and  two  shift  orthogonal  functions  4>i(t)  and  <j>2(t) 
with  shift  period  2 A  (state  cri).  Then  the  dimensional  rate  in 
a  given  BW  is  fixed,  but  how  the  dimensions  are  distributed  in 
the  time-frequency  plane  differs  for  states  <r0  and  cr\ .  Consider 
parsing  the  source  symbols  an  into  non  overlapping  blocks. 
The  state  of  the  modulator  can  be  controlled  by  an  extra 
binary  data  stream,  whose  rate  matches  the  symbol  6/ocfcrate. 

The  switching  of  the  basis  for  two  adjacent  blocks  could 
lead  to  ISI  at  the  boundary  of  the  adjacent  blocks.  However, 
given  the  state  of  the  modulator,  this  ISI  is  deterministic  and 
can  be  remedied. 

The  coherent  demodulator  at  the  receiver  can  either  oper¬ 
ate  following  a  Maximum  Likelihood  (ML)  detection  rule,  or 
performing  hierarchical  (suboptimal)  demodulation. 

The  ML  detection  rule  can  be  formulated  to  determine  the 
state  of  the  modulator  from  the  observation  of  the  received 
signal  associated  with  the  InterSymbol  Interference  (ISI)  free 
portion  of  the  blocks.  Efficient  search  for  the  ML  estimate  of 
the  an  can  be  performed  using  the  Viterbi  algorithm  with  state 
complexity  of  A0 ^L+1)  (assuming  that  L  is  odd),  where  A  is 
the  alphabet  size  of  the  sequence  an  and  L  +  1  is  the  number 
of  samples  of  the  scaling  and  wavelet  vectors  [2].  Once*  the 
sequence  an  is  detected,  assuming  that  the  receiver  operates 
with  very  low  error  probability,  we  can  use  the  ML  estimated 
data  vector  a  to  estimate  the  modulator  state. 

A  practical  alternative  may  be  to  use  the  correlation  prop¬ 
erties  of  the  sampled  outputs  of  the  Matched  Filters  (MFs) 
at  the  receiver.  Suppose  the  receiver  employs  one  set  of  MFs 
for  each  state  of  the  modulator.  Then  only  the  outputs  of 
the  correct  MFs  will  be  uncorrelated.  Hence,  time-averaged 
auto-correlation  of  the  output  samples  of  the  MFs  can  be  used 
to  determine  the  modulator  state.  Once  the  modulator  state 
has  been  estimated  for  a  given  block,  the  output  of  the  correct 
MF  is  sampled  to  demodulate  the  received  sequence  for  the 
portion  of  the  block  that  is  not  corrupted  by  ISI.  The  portions 
of  the  block  that  may  experience  ISI  are  demodulated  from 
the  knowledge  of  the  modulator  state  in  the  previous  and  the 
present  blocks. 

All  the  concepts  presented  above  can  be  generalized  to  the 
case  where  there  are  other  orthogonal  channelizations  of  the 
available  spectrum  and  can  further  be  combined  with  channel 
coding. 
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Abstract  —  We  show  that  the  performance  of  an  M- 
Algorithm  detector  for  linear  partial  response  coded 
modulation  depends  critically  on  phase  and  is  charac¬ 
terized  by  the  partial  energy  function  of  the  encoder. 

I.  Introduction 

Many  practical  communication  channels  may  be  adequately 
described  by  an  equivalent  discrete  time  model 

m 

rk  =  hodk  +  hiak~i  +  nu  (1) 

1=1 

where  a*  represents  the  data  symbol,  hk  represents  an  im¬ 
pulse  response,  nk  an  additive  white  Gaussian  noise  (AWGN) 
component  and  m  represents  the  channel  memory.  The  above 
discrete  time  model  can  be  used  to  construct  a  trellis.  Max¬ 
imum  likelihood  sequence  estimation  may  be  performed  by 
searching  this  trellis  with  the  Viterbi  Algorithm  (VA),  but  its 
complexity  grows  exponentially  with  the  length  of  the  channel 
impulse  response. 

A  number  of  reduced  search  techniques  like  the  M- 
Algorithm  (MA)  have  been  developed  to  achieve  near  opti¬ 
mum  performance  at  a  fraction  of  the  optimum  receiver  com¬ 
plexity.  In  applications  like  mobile  communication,  the  phys¬ 
ical  channel  must  often  be  characterized  as  a  non-minimum 
phase  channel.  The  purpose  of  this  work  is  to  characterize 
the  effect  of  non-minimum  phase  channels  on  reduced  search 
decoding  complexity. 

One  feature  that  distinguishes  channels  having  identical 
spectra  and  free  distance  but  different  phase  is  the  partial 
energy  given  by  E(n)  =  IM^)|2’  II  E(n)  represents  the 

partial  energy  of  any  finite  duration  channel  h(n),  then 

EmM  <  E(n)  <  Emin(n)  (2) 

where  Emin(n)  and  Emax{n)  represent  the  partial  energies  of 
the  minimum  and  maximum  phase  channels  having  the  same 
magnitude  frequency  response  as  h(n). 

II.  Decoder  Simulation  Results 
Channel  phase  effects  were  determined  by  performing  MA  de¬ 
coder  tests  on  different  channels  with  the  same  autocorrela¬ 
tion.  The  results  for  one  representative  10  tap  channel  class[2] 
having  one  real  zero  and  4  pairs  of  complex  conjugate  zeros, 
are  described  here.  The  class  is  specified  by  the  normalized 
99%  bandwidth  (NBW)  and  minimum  distance  loss  (MDL) 

d2, 

measured  by  MDL  =  10log10  • 

The  minimum  phase  channel,  maximum  phase  channel  and 
4  mixed  phase  channels  belonging  to  this  class  were  chosen  for 
performing  MA  tests.  The  partial  energy  curves  and  column 

1This  work  was  partly  supported  by  General  Electric  Corporate 
Research  and  Development  Center,  Schenectady,  New  York. 


Figure  1:  (a)  Partial  energy  curves  (b)  Distance  profile  for 
selected  channels  of  10  tap  class  (NBW=0.36,  MDL— 0.19 
dB). 


Channel 

a 

b 

c 

d 

e 

/ 

Number  of  paths  (M) 

4 

5 

18 

32 

32 

128 

Table  1:  MA  decoder  results  for  10  tap  equivalence  class. 

distance  profiles  of  these  channels  are  plotted  in  Figures  1(a) 
and  1(b)  respectively.  MA  simulations  were  carried  out  on 
these  channels  and  the  complexity  was  measured  in  terms  of 
the  minimum  number  of  paths  (M)  needed  by  the  decoder  at 
each  tree  level  in  order  to  achieve  near-MLSE  performance. 
The  complexity  required  by  each  of  the  channels  is  summa¬ 
rized  in  Table  1  .  The  minimum  phase  channel  (a)  needs  the 
lowest  value  of  M  (4  paths)  while  the  maximum  phase  channel 
(/)  needs  the  highest  complexity  (M=128  paths).  Channels 
that  have  similar  partial  energy  curves  turn  out  to  require 
the  same  complexity.  The  partial  energy  curves  of  any  one 
channel  class  show  groups  of  channels  having  similar  curves 
and  the  complexity  required  by  the  MA  decoder  increases  as 
we  move  from  one  group  to  another  one  lower  in  the  partial 
energy  picture. 

We  have  analyzed  many  channels  (i.e.,  sets  of  {hk})  and 
all  show  a  behaviour[l]  similar  to  Figure  1,  except  that  the 
number  of  partial  energy  groups  varies  from  1  to  8.  A  superior 
partial  energy  curve  and  distance  profile  guarantees  lower  MA 
decoder  complexity,  but  we  see  that  partial  energy  serves  as  a 
better  indicator  of  performance  than  the  distance  profile. 
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Abstract  -In  this  paper  we  investigate  applicability  of  simplified 
decoders  of  convolutional  codes  to  the  case  of  multilevel  coding 

[1],  System  behaviour  is  examined  by  means  of  minimum 
distance  analysis  and  simulation. 

I.  Problem  statement 

The  objective  of  our  research  is  to  investigate  applicability  of  re¬ 
duced  complexity  algorithms  for  the  decoding  of  multilevel  codes 
combined  with  multi-resolution  QAM,  as  proposed  for  the  terres¬ 
trial  transmission  of  HDTV  signals  in  Europe. 

Multistage  decoder  shown  in  the  Fig.  1  will  be  examined  in  the 
paper.  In  this  figure  only  the  inner  level  of  coding  and  modulation 
is  shown.  Other  elements  of  the  system  are  omitted  [3]. 
Simplifications  of  the  receiver  are  based  on  two  different  ap¬ 
proaches:  on  the  M-algorithm  [4],  which  is  the  optimum  solution 
for  searching  a  limited  part  of  trellis,  and  RSSE  algorithm  which  is 
not  optimum  but  is  easier  to  build  in  hardware  than  M-algorithm. 
Both  of  these  solutions  consist  of  using  a  smaller  number  of  states 
than  that  of  the  Viterbi  algorithm. 

The  following  benefits  can  be  potentially  achieved  via  the  simpli¬ 
fied  algorithms  in  the  receiver: 

a)  reduction  of  complexity  of  the  decoder  (reduction  of  the 
total  cost  of  the  system) 

b)  additional  performance  gain,  for  the  fixed  receiver  comple¬ 
xity  by  the  proper  choice  of  the  code  structure  in  the  trans¬ 
mitter. 

The  main  purpose  of  the  paper  is  to  see  if  there  is  an  additional 
coding  gain  achievable  via  the  use  of  the  multistage  decoders  based 
on  the  M-algorithm  and  RSSE  [4]  approach  and  if  so,  how  large  it 
is.  Analysis  is  done  on  the  basis  of  asymptotic  coding  gain 
(minimum  distances).  Numerical  results  of  computer  simulation  are 
also  provided. 


II.  Numerical  results 

Firstly,  we  examine  the  degradation  of  performance  due  to  simplifi¬ 
cations  of  decoding.  It  has  been  done  by  simulations.  An  example 
of  numerical  results  is  shown  in  the  Fig.  2.  These  are  simulated  bit 
error  rates  for  convolutional  codes  of  rate  r q=1/3  and  rj=2/3  de¬ 
coded  by  RSSE  algorithm,  for  the  system  of  Fig.  1 .  Losses  in  this 
case  are  about  1  dB  for  reduction  from  64  states  to  32  states  for 
Gaussian  channel.  Results  for  Rayleigh  channel  are  also  provided. 
Typically,  it  turns  out  that  for  Rayleigh  channel  and  complexity 
reduction  greater  than  2,  losses  are  significant  (greater  than  3  dB). 
For  concatenated  coding  systems  very  important  to  investigate  are 
the  properties  of  error  bursts  at  the  output  of  the  decoder.  We  have 
analyzed  the  distribution  of  the  average  value  of  burst  length. 
Numerical  results  for  different  rates  and  complexity  reduction  are 
provided  for  Gaussian  and  for  Rayleigh  channels.  Typically,  the 
length  of  bursts  at  the  output  of  decoders  with  reduced  complexity 
increases  with  decreasing  number  of  the  decoder  states. 
Additionally,  for  multistage  decoding  average  value  of  burst  length 
are  up  to  3  times  greater  than  for  the  case  of  single  stage  coding. 
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Fig.  1.  Block  diagram  of  multistage  decoder  of  multilevel  coding  of  4 
QAM. 


Fig.  2.  Simulated  bit  error  rate  for  RSSE  decoding  of  multilevel 
convolutional  codes  with  rates  1/3  and  2/3  combined  with  4 
QAM. 

Table  1.  Comparison  of  the  values  of  average  burst  length  for 
simplified  decoding  of  single  level  convolutional  codes 
(r=l/3,r=2/3)  with  multilevel  coding.  Results  for  constant  value  of 
bit  error  rate  (BER=10"3)  and  different  number  of  states  of  the 
decoder  Vrec. _ _ 


Vrec  [states] 

64 

32 

16 

8 

Br,V 

[bits] 

r=l/3  RSSEAWGN 

10 

20 

50 

150 

r=2/3  RSSE  AWGN 

10 

60 

150 

300 

(1/3. 2/3)  RSSEAWGN 

30 

100 

300 

750 
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Abstract  -  A  new  general  criterion  for  the  optimal  design  of 
(possibly)  time-varying  and  nonlinear  trellis-codes  for  reliable 
transmission  of  digital  information  sequences  over  arbitrary 
(possibly)  time-varying  Discrete  Memoryless  Channels 
(DMCs)  is  presented.  The  criterion  is  derived  on  the  basis  of 
new  tight  generally  time-varying  analytical  upper  bounds 
developed  for  the  performance  evaluation  of  MAP  decoders 
with  finite  decoding  constraint-lengths  which  minimise  the 
symbol-error  probability.  New  procedures  related  to  the 
proposed  criterion  are  also  presented,  allowing  a  direct 
construction  of  good  trellis-codes  for  any  arbitrary  DMC  and 
for  any  assigned  value  of  the  decoding  constraint-length. 

Summary 

The  common  design  criterion  for  trellis-codes  requires  the  maximisation 
of  the  minimum  Hamming  distance  (the  so-called  "free-distance")  between 
codewords.  Although  this  criterion  is  largely  used  in  practical  applications, 
its  validity  is  not  quite  general.  In  fact  it  is  well  known  that,  almost  in 
principle,  it  is  optimal  only  in  the  case  when  the  employed  trellis-code  is 
linear  (i.e.,  it  is  a  convolutional  code),  the  assigned  DMC  is  time-invarying, 
binary,  symmetric  and  with  a  very  small  cross-over  probability  and  a 
sequence  Maximum  Likelihood  (ML)  decoder  with  infinite  decoding 
constraint-length  A  is  present  at  the  receiver  site  [1].  Barring  for  this  case, 
the  general  issue  of  "good"  trellis-code  design  for  arbitrary  noisy  DMC 
channels  seems  not  yet  well  explored  in  the  literature. 

In  this  contribution  a  novel  general  criterion  is  presented  for  the  optimal 
design  of  trellis-codes  (in  general,  nonlinear  and  time- varying)  for  arbitrary 
noisy  DMCs  (in  general,  non-binary,  non-symmetric,  time-varying  and 
characterised  by  an  arbitrary  error-rate)  when  a  decoder  which  minimises  the 
symbol-decoding-error  probability  (i.e.,  a  symbol-by-symbol  MAP 
decoder)  with  an  assigned  and  limited  value  A  of  the  decoding  constraint- 
length  is  employed. 

The  application  environments  of  the  proposed  criterion  are  larger  than 
that  pertaining  to  the  other  criteria  known  in  literature.  In  particular,  the 
validity  of  the  mentioned  criterion  is  not  restricted  to  the  class  of  linear 
trellis-codes  (i.e.,  of  the  convolutional  codes)  and  of  symmetric  DMCs; 
moreover,  it  allows  to  take  into  account  explicitly  the  value  A  assigned  to  the 
decoding  constraint-length.  The  presented  criterion  is  based  on  the 
following  (generally)  time-varying  upper-bound  derived  in  [3]  as  an 
application  of  the  Chebyshev  inequality  to  the  performance  evaluation  of 
symbol-by-symbol  MAP  decoders: 

P(5(k)  *  ^MAP(k  I  k+A))  <  2  Tr{  S(k  I  k+A)},  k>l.  (1) 

In  (1)  the  Markov  chain  {^(k),  k  >  1 }  is  the  so-called  "state-transition 
sequence"  of  the  trellis-encoder  (defined  as  in  [4, Sect. II])  and 
{  ^MAP(klk+A),  k  >  1 }  is  the  corresponding  (optimal)  MAP  estimate 
sequence  (computed  recursively  as  in  [2])  when  the  decoding  constraint- 
length  takes  on  the  value  A.  Moreover,  Tr{  S(klk+A)}  is  the  trace  of  the 
average  covariance  error  matrix  S(klk+A)  of  the  so-called  "fixe-lag  basic 
smoother"  [2]  and  the  sequence  { S (klk-t-A),  k  >  1 )  can  be  recursively 
computed  with  respect  to  (wrt)  the  index  k  on  the  basis  of  a  Riccati-type 
equation  (formally  similar  to  the  well-known  equation  employed  for  the 
computation  of  the  mean  square  error  performance  of  a  conventional 
Kalman  filter)’  as  shown  in  PI- 11  must  be  ^marked  [3]  that  the  sequence 
{ Tr{  S (klk-hA) } }  jointly  depends  on  the  sequence  of  the  probability 
transition  matrices  of  the  assigned  noisy  DMC  and  on  the  set  of  the 
codewords  of  the  employed  trellis-code;  therefore,  the  minimisation  of  the 
upper-bound  sequence  of  (1)  wrt  the  admissible  sets  of  codewords  gives  a 


fully  general  criterion  for  the  synthesis  of  good  trellis-codes  for  any 
assigned  value  of  A  and  for  any  arbitrary  noisy  DMC. 

Procedures  based  on  the  described  criterion  for  the  construction  of  good 
trellis-codes  with  assigned  rate  R=b/n  and  encoding  constraint-length  L 
(defined  as  in  [1])  have  been  implemented  via  computer  [3];  for  illustrative 
purposes,  the  trellis  diagrams  of  the  best  trellis-codes  with  rate  R=l/2  and 
L=2  obtained  by  means  of  an  application  of  the  mentioned  construction 
procedure  are  reported  in  the  Figures  for  some  simple  cases  of  stationary 
binary  DMCs  with  transition  probabilities  p=P(lll)  and  q=P(OIO).  The  two 
cases  A=0  and  A=2  have  been  considered  in  (a),(b),(c)  and  (d),(e),(f) 
respectively.  In  the  Table  the  steady-state  value  of  the  sequence 
{Tr{S(klk+A)}J  (denoted  by  Tr{  S(oo|oo+A)})  is  reported,  together  with 
the  corresponding  average  bit-error-rate  (BER)  (evaluated  by  Montecarlo 
simulations)  of  the  encoders  generating  the  presented  codes  (the  bold 
numbers  denotes  that  the  code  has  been  optimized  for  A=0  or  A=2). 

On  the  basis  of  our  analysis  [3],  some  conclusions  can  be  drawn: 

-  for  an  assigned  DMC,  the  best  trellis  code  for  the  case  A=0  not  always 
agrees  with  the  best  for  A=2;  in  fact,  in  general,  the  topology  and/or  the 
labelling  of  the  optimal  trellis-code  change  with  the  value  assumed  by  the 
decoding  constraint-length;  moreover,  a  value  of  A  nearly  equal  to  the 
encoding  constraint-length  L  results  in  a  negligible  degradation  wrt  the 
optimum  performance  (ideally  obtained  for  A— >  «); 

-  the  topology  and/or  the  labelling  of  the  optimal  trellis-code  strongly  depend 
on  the  statistical  properties  of  the  assigned  DMC. 
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Identification  via  Compressed  Data 


Rudolf  Ahlswede,  En-liu 

I.  Introduction 

In  this  paper,  a  combined  problem  of  source  coding  and  iden¬ 
tification  is  considered.  To  put  our  problem  in  perspective, 
let  us  first  review  the  traditional  problem  in  source  coding 
theory.  Consider  the  following  diagram,  where  {Xnj^Tj  is  an 


Figure  1:  Model  for  source  coding 

i.i.d  source  taking  values  on  a  finite  alphabet  X.  The  encoder 
output  is  a  binary  sequence  which  appears  at  a  rate  R  bits 
per  symbol.  The  decoder  output  is  a  sequence  {AT}i°  which 
take  values  on  a  finite  reproduction  alphabet  y .  In  traditional 
source  coding  theory,  the  decoder  is  required  to  be  able  to  re¬ 
cover  {Arn}f°  completely  or  with  some  allowable  distortion. 
That  is,  the  output  {Xn}i°  must  satisfy 

11 

X„X,)<d  (1) 

1=1 

for  sufficiently  large  n ,  where  p  :  X  x  y  — ►  [0,  Too)  is  a  dis¬ 
tortion  measure  and  d  >  0  is  the  allowable  distortion.  The 
problem  is  then  to  determine  the  infimum  of  rate  R  such  that 
the  system  shown  in  Fig.  1  can  operate  in  such  a  way  that  (1) 
is  satisfied.  From  rate  distortion  theory,  this  infimum  is  given 
by  the  rate  distortion  function  of  the  source  {X7i}i°. 

Let  us  now  consider  the  system  shown  in  Fig.  2.  The  se- 


{ yn > 


Figure  2:  Model  for  joint  source  coding  and  identification. 

quence  {Yn}i°  is  a  sequence  of  i.i.d  random  variables  taking 
values  on  y.  Known  {Yn},  the  decoder  is  now  required  to  be 
able  to  identify  whether  or  not  the  distortion  between  {AT} 
and  {Yn}  is  less  than  or  equal  to  d  in  such  a  way  that  two 
kinds  of  error  probabilities  satisfy  some  prescribed  conditions. 
The  problem  we  are  now  interested  in  is  still  to  determine  the 
infimum  of  rate  R  such  that  the  system  shown  in  Fig. 2  can 
operate  in  this  way. 

II.  Formal  Formulation  of  Problem 

Let  {(AT,  Yn)}r  be  a  sequence  of  independent  drawings  of 
a  pair  (X,  Y)  of  random  variables  taking  values  on  X  x  y 
with  joint  distribution  PXy.  Fix  0  <  d  <  Ep(X,Y).  An 
?ith-order  identification  (ID)  code  Cn  is  defined  to  be  a  triple 
Cn  =  (/„,  Bn,(Jn),  where  Bn  C  {0,  1}*  is  a  prefix  set,  /n (called 
an  “encoder15)  is  a  mapping  from  A'71  to  Bn ,  and  </„ (called  a 
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“decoder”)  is  a  mapping  from  yn  x  Bn  — +  {0,  l}.  When  Cn 
is  used  in  the  system  shown  in  Fig. 2,  its  performance  can  be 
measured  by  the  following  three  quantities:  the  resulting  av¬ 
erage  rate  defined  by  r„(C„)  =  FT-1  (the  length  of  /n(Xn)), 
the  first  kind  of  error  probability  defined  by  pei(Cn)  = 
Pr {<7n(Yn,/n(Xn))  -  0|p„(Xn,Y")  <  d},  and  the  second 
of  error  probability  defined  by  pe 2  =  Pr{</n( Yn,  fn(X11))  = 
l\Pn(Xn,Yn)>d). 

Let  R  G  [0,+oo),  O'  G  (0,+oo]  and  0  G  (0,+oo].  A  triple 
(R,  o,/?)  is  said  to  be  achievable  if  for  any  c  >  0,  there  exists 
a  sequence  {Cn}  of  ID  codes,  where  Cn  =  (/„,  Bn,  gn)  is  an 
nth-order  ID  code,  such  that  for  sufficiently  large  u, 

rn(Cn)  <  R  +  c  ,  Pei  <  2~n(a~c)  and  pe 2  <  2~n(/3~€)  , 
where  as  a  convention,  a  =  +oo(0  —  Too,  resp.)  means 
that  the  first(second,  resp.)  kind  of  error  probability  of  Cn 
is  equal  to  0.  Let  7v.  denote  the  set  of  all  achievable  triples. 
In  this  paper,  we  are  interested  in  determining  the  closure 
R  of  R.  Specifically,  we  define  for  each  pair  (a,/3),  where 
cv,  /3  G  [0,  +oo], 

R*xv(<xJ,d)  =  inf  { R\(R,  a,  ft)  G  R  }  . 

Our  main  problem  is  then  the  determination  of  the  function 
R*Xy((vyR,d). 

III.  Main  Results 

Assume  that  X  and  Y  are  independent.  For  any  0  <  d  < 
Fp(X,Y),  define  0(d)  by  0(d)  =  inf  D(P\\PXy),  where  the 
infimum  is  taken  over  all  distributions  P  on  X  x  y  such  that 
P(®,  y)p(x,  y)  <  d.  Let  U  be  a  random  variable  tak¬ 
ing  values  on  some  finite  set  U.  Let  Pxu  denote  the  joint 
distribution  of  X  and  U .  For  any  o;  >  0,  define 

£(Pxu,  ft,  d)  =  mf{D(Py\\Py)  +  /(£/;  Y)}  , 

where  the  infimum  is  taken  over  all  random  variables  Y  tak¬ 
ing  values  on  y  such  that  Ep(X,Y)  <  d  and  D(Py\\Py)  + 
I (XU;  Y)  <  0(d)  +  a.  Here  we  make  use  of  the  convention 
that  the  infimum  taken  over  an  empty  set  is  +oo.  For  any 
0  >  0,  let  R(PX,  Py,®!  0,d)  be  the  infimum  of  I(X\U)  over 
all  random  variables  U  such  that  £(PXu,  a,  d)  >  0,  and  let 
R(PX ,  Py}  a,  0,  d)  =  lim  R(PX ,  Py ,  or,  0 ,  d)  . 

/3— *  0  + 

The  following  theorem  gives  a  general  formula  for 
R*x :y(ai  Pi  d). 

Theorem  1  For  any  0  <  d  <  Ep(X,Y),  0  <  0  <  0(d),  and 
O'  G  (0,  Too],  the  following  holds 

RxY(<*,0>d)  =  R(Px,PY><*,0td)  , 

where 

R(Px,  Py*  o,  0,  d)  =  lim  R(PX ,  Py ,o,  0’ ,d)  . 

f3*  —f3~ 

The  converse  part  of  Theorem  1  is  related  to  the  general 
isoperimetric  problem.  During  the  process  of  proving  the  con¬ 
verse  part,  we  develop  a  new  powerful  method  for  converse¬ 
proving  in  multi-user  information  theory.  For  more  details, 
please  refer  to  [l]. 
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Abstract  —  An  asymptotic  expression  is  derived  for 
the  Fisher  information  of  the  sum  of  two  independent 
random  variables  X  and  Z€  when  Zc  is  small,  under 
some  regularity  conditions  on  the  density  of  X  and 
conditions  on  the  moments  of  Z€.  Using  this  result 
for  the  case  Zc  =  eZ,  some  asymptotic  generalization 
of  De  Bruijn’s  identity  is  obtained. 


for  some  integer  m  >  1.  Am(X,  {E(Zk)})  depends  on  f(x) 
and  E(Zk),k  —  2,  ...,ra.  For  m  =  2,  (2)  becomes 

J(X  +  Z<)  =  J(X)  +  L(X)e2  +  o(t2),  e  — ►  0,  (3) 

where  L(X)  is  an  integral  expression  involving  f(x)  and  its 
first  three  derivatives.  For  example,  if  X  is  Gaussian  with 
variance  a2 ,  the  above  expansion  yields  : 


1.  Introduction 

The  Fisher  information  of  a  random  variable  Y  with  absolutely 
continuous  density  fy  is  given  by 


IxM 

fy(y) 


fy(y)dy. 


(1) 


It  plays  an  important  role  in  information  theory  and  statistics. 
Under  certain  regularity  assumptions,  the  Fisher  information 
of  an  additive  noise  random  variable  characterizes  the  main 
term  in  the  asymptotic  expansion  of  the  Shannon  mutual  in¬ 
formation  between  the  input  and  output  signal  of  an  additive 
noise  channel  when  the  input  signal  is  weak  [1,2,3].  Fisher 
information  also  appears  in  the  well-known  Cramer- Rao  in¬ 
equality. 


II.  Problem  Formulation 

If  Y  =  X+Z,  with  X  and  Z  independent  random  variables, 
an  explicit  calculation  of  the  integral  (1)  is  impossible  in  gen¬ 
eral.  Therefore,  it  is  of  interest  to  investigate  the  asymptotic 
behavior  of  J(Y),  when  the  perturbation  Z  of  X  is  weak  in 
the  sense  that  Z  =  Ze  and  E(Z2)  =  e2  — »  0.  In  this  paper 
we  derive  an  asymptotic  expansion  for  the  Fisher  informa¬ 
tion  J(X  +  Z€)  in  terms  of  the  probability  density  function 
(pdf)  of  X  and  higher  moments  of  Ze,  if  certain  conditions 
are  satisfied.  The  similar  problem  of  deriving  an  asymptotic 
expression  for  the  differential  entropy  h(X  +  Z€)  of  the  sum  of 
two  independent  random  variables  X  and  Ze  when  Z£  is  small 
has  been  investigated  in  [4]. 

III.  Main  Result 

Without  loss  of  generality  we  assume  E(X)  =  E{ZC)  =  0. 
Suppose  E(Z2)  =  e2,  and  E\Ze/c\n^^  <  c  <  oo  for  some 
integer  n  >  2,  some  constant  c  and  0  <  7  <  1.  Let  X  have 
a  bounded  pdf  fx(x)  =  /(z),  which  has  bounded  continuous 
derivatives  f^k\x)  for  k  —  +  2.  Then,  under  some 

additional  conditions  on  f(x)  (which  hold  for  a  large  class 
of  smooth  densities),  and  if  X  and  Zc  are  independent,  the 
following  asymptotic  expansion  holds  as  c  — ►  0  : 

J(X  +  Zt)  =  J(X)  +  Am(X,  { E(Z «*)})  +  o(cm)  (2) 


J(X  +  ZC)  = 


1  6  .  /  2x 


(4) 


IV.  Some  asymptotic  generalization  of  De 
Bruijn’s  identity 

For  the  special  case  Zt  =  eZ  the  asymptotic  expansion  (2) 
can  be  written  as 


J(X  +  tZ)  =  J(X)  +  Bm(X,  { E(Zk )},  Uk})  +  o(em)  (5) 

where  Bm  depends  on  f(x),  the  moments  E(Zk)  and  the  pow¬ 
ers  ck ,  2  <  k  <  m.  Also,  when  Z  has  a  Gaussian  distribution 
with  unit  variance,  X  has  a  pdf  with  finite  variance,  and  X 
and  Z  are  independent,  De  Bruijn’s  identity  holds  : 


dh(X  +  tZ) 

d(<2) 


\j(X  +  eZ). 


(6) 


Thus,  by  using  the  expansion  (5)  if  Z  is  Gaussian  and  substi¬ 
tuting  it  in  the  integral  version  of  (6),  we  obtain  an  asymp¬ 
totic  expansion  for  h(X  -f -cZ).  This  expansion  coincides  with 
the  expansion  for  h(X  +  cZ)  obtained  in  [4]  if  Z  is  Gaussian. 
Moreover,  by  comparing  both  the  asymptotic  expansion  for 
h(X  +  eZ)  in  [4]  and  the  one  derived  here  for  J(X  +  tZ)  for 
non-Gaussian  Z,  we  obtain  some  asymptotic  generalization  of 
De  Bruijn’s  identity  to  the  case  where  Z  is  non-Gaussian. 
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I.  Introduction 

The  well  known  Brunn-Minkowski  Inequality  (BMI),  is  one  of 
the  basic  inequalities  in  geometry.  Its  formal  statement  is  the 
following.  Let  Ai  and  A2  be  two  sets  in  %d .  Then, 

p  {Ai  +  A2)1/d  >  P  (*4i)1/d  +  P  •  (!) 

where  p{A)  =  fxeA  dx  is  the  (d-dimensional)  volume  of  A  , 
and  Ai  +  A2  =  \x  +  y  :  x  £  Ax,  y£  An}  is  the  Minkowski 
sum  of  Ai  and  A2.  This  sum  may  be  interpreted  as  the  geo¬ 
metric  convolution  of  the  two  regions.  Equality  in  (1)  holds  if 
the  two  regions  are  convex  and  proportional,  e.g.,  if  they  are 
balls  or  cubes  (with  parallel  edges).  For  d  —  1,  this  condition 
is  reduced  to  the  simple  case  where  Ai  and  A2  are  intervals 
(and  not,  e.g.,  a  union  of  intervals). 

The  BMI  is  dual  in  some  sense  to  the  Entropy-Power  In¬ 
equality  (EPI)  [1],  which  lower  bounds  the  entropy-power  of 
the  sum  of  independent  random  variables.  In  [2]  a  matrix 
form  for  the  EPI  was  derived,  and  some  of  its  applications 
have  been  pointed  out.  Analogously,  we  derive  in  this  work  a 
matrix  form  for  the  BMI,  and  discuss  its  applications. 

II.  Linear  Transformation  of  Sets  and  the 
Matrix  BMI 

We  first  introduce  the  matrix  form  of  the  Minkowski  sum. 
Let  A*  =  (^4i  ...An)  be  a  vector,  whose  n  components  are 
d-dimensional  sets.  We  define  a  linear  transformation  of 

Ai...An  as 

T*4  =  {Tx  :  xi  e  Ai  for  t  =  1 . .  -  n}  ,  (2) 

where  T  is  an  m  X  n  matrix.  In  particular,  tA  means  scaling 
the  coordinates  of  A  by  the  scalar  t.  Note  that  TA  is  an 
m d-dimensional  shape.  Denote  the  volumes  of  the  shapes  by 
li(Ai)  =  pt,i  =  1 . .  -  n.  Following  simple  laws  of  integration, 
the  md- dimensional  volume  of  TA,  in  the  particular  case  m  = 
n,  is  p(TA)  =  \T\d  *  p(A)  =  \T\d  •  nr=i  Mi  >  where  |  •  |  denotes 
the  absolute  value  of  the  determinant.  For  the  general  case, 
we  suggest  the  following  matrix  generalization  oi  theJBMI: 

Theorem  1  (Matrix- BMI):  Let  A  =  ( Ai...An )  be  a 
vector  of  d-dimensional  cubes  whose  edges  parallel  the  axes, 
and  whose  volumes  are  the  same  as  of  A\ .. .  An,  >*e.,  p(Ai)  = 
pi,i  =  1  .  . .  rc.  Then 

i/d  . 

p  (TA)1/d  >  p  (TA?)  =  \Tl\  (3) 

»=i 

where  T  =  T  ■  L,  L  is  an  n  x  n  diagonal  matrix  whose  diago¬ 
nal  elements  are  p\^d  . . .  Pnd  (the  edges’  lengths  of  the  cubes 
Ai .  ..An),  and  (fi,«  =  1 . .  ■  (”)}  is  the  set  of  all  possible 

°This  research  was  supported  in  part  by  the  Wolfson  Re¬ 
search  Awards,  administered  by  the  Israel  Academy  of  Science  and 
Humanities. 


m  X  m  sub-matrices  of  T,  obtained  by  choosing  m  out  of  the 
n  columns  ofT.  ^ 

For  m  =  1,  (3)  reduces  to  p  (E"=i  - 

Z"=i  ie> to  the  re§ular  BMI C1)-  E<iuaJity  m  (3) 

holds1  in  each  one  (or  in  a  mixture)  of  the  following  cases:  if 
Ai  ...An  are  cubes  whose  faces  parallel  each  other;  if  (after 
removing  the  all  zero  columns  of  T,  if  any)  tti  =  n,  or  if  T 
does  not  have  a  full  row  rank,  where  then  p{TA)  =  0.  The¬ 
orem  1  is  proved  via  a  double  induction  over  the  dimensions 
of  T,  using  a  conditional  form  of  the  BMI,  analogously  to  the 
proof  of  the  matrix-EPI  in  [2]. 

In  order  to  appreciate  the  usefulness  of  Theorem  1,  con¬ 
sider  the  following  example.  Let  A  =  (Ai...An)  and 
B  =  (Bi  . . .  Bny ,  where  Ai  ..  .An,Bi  .  ..Bn  are  d-  dimensional 
shapes  of  unit  volume,  and  let  T\  and  T2  be  n  x  n  matri¬ 
ces.  Consider  the  volume  of  the  sum  TiA  +  T2B.  A  direct 
application  of  the  regular  BMI  (1)  gives 

p(T1A  +  T2B)Und  >  p(TiA)1/nd  +  p(T,B)^nd 

=  |T1|1/n  +  |T2|1/n.  (4) 

On  the  other  hand,  we  may  view  the  sum  Ti^4+T2  5  as  a  trans¬ 
formation  of  the  2 n-dimensional  vector  (Ai - 4„,  Bx  . . .  Bn) 

by  the  n  x  2n  matrix  T  =  (Ti;T2).  Theorem  1  may  then  be 
used  to  obtain 

~  ltd 

p{TiA+T2B)1,d  >  p  (nJ+Til)  =]T|(T,)i|.  (5) 

V  i~l 

But,  by  Theorem  1,  for  Ai . . .  Bn  cubes  of  unit  volume,  (5) 
becomes  an  equality,  while  (4)  remains  an  inequality  (unless 
T\  and  T2  are  proportional).  We  conclude,  then,  that  (5)  is  a 
tighter  lower  bound  than  (4). 

As  in  the  case  of  other  inform ation-theoretic  inequalities 
[1],  the  new  matrix  BMI  can  be  used  to  derive  interesting 
inequalities  for  determinants.  One  such  example  is  the  in¬ 
equality  just  discussed  between  the  right  hand  sides  of  (4) 
and  (5).  To  obtain  another  inequality,  we  apply  the  matrix 
BMI  to  a  linear  transformation  of  rectangular  parallelepipeds 
while  substituting  the  expression  for  its  exact  volume  (which 
is  computable  in  this  case).  Finally,  we  note  that  the  matrix 
BMI  can  be  used  to  lower  bound  the  volume  of  a  projection 
of  a  lattice  cell,  and  so  it  can  find  applications  in  calculating 
the  effective  number  of  codewords  of  lattice  constellations  or 
lattice  quantizers  satisfying  spectral  constraints. 
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Consider  the  following  model  of  a  permutation  channel.  In 
each  time  unit  two  sources  produce  one  bit  each  (0  or  1  with 
probability  P(0)  =  P(l)  =  0.5).  These  two  bits  arrive  at  an 
organizer,  who  in  the  same  time  unit  has  to  output  one  bit. 
The  other  bit  he  may  store  in  some  memory  device.  If  it  is 
possible  the  output  bit  must  be  a  1.  So  if  the  arriving  bits  are 
11,  10  or  01,  then  the  organizer  will  send  a  1  for  sure.  If  both 
sources  produce  a  0,  then  the  organizer  may  send  a  1,  which 
is  stored  in  the  memory  device  (and  the  size  of  the  memory 
will  be  reduced  by  one  bit  in  this  case).  If  this  is  not  possible, 
then  the  organizer  must  send  a  0. 

A  natural  question  is:  How  much  influence  does  the  size 
of  the  memory  have  on  the  behaviour  of  the  sequence  of  bits 
transmitted  by  the  organizer?  As  a  simple  measure  for  the 
influence  of  the  memory  we  consider  the  expected  value  of  the 
first  occurence  of  a  0  in  this  sequence. 

If  there  is  no  memory  at  all,  then  this  expected  value 
is  4,  since  in  this  case  we  have  a  geometric  distribution  with 
parameter  0.25  as  probability  that  a  0  is  transmitted  in  each 
time  unit. 

If  the  memory  device  can  store  every  incoming  bit  (i.e.  , 
the  size  of  the  memory  is  linear  in  time),  it  turns  out 
that  this  expected  value  does  not  exist.  To  see  this,  observe 
that  the  bits  produced  by  the  two  sources  yield  a  sequence 
of  l’s  and  -l’s,  if  we  represent  a  0  by  a  -1  and  let 
the  bits  produced  by  the  first  source  take  the  odd  positions 
and  the  bits  produced  by  the  second  source  take  the  even 
positions  in  the  sequence.  Two  necessary  conditions  for  the 
occurence  of  the  first  0  at  time  t  are  iJET"1  A*)  =  0  (i.  e., 
the  memory  is  exhausted  at  time  t-  1)  and  ii)  all  partial  sums 
Y22j  =  1  ^  =  1,  ■  -  - ,  t  —  2  are  nonnegative  (i.  e.,  no  0  has 

been  transmitted  before).  By  the  Ballot  Theorem  the  number 
of  {1,  -l}-sequences  fulfilling  i)  and  ii)  is  just  the  number 
7TT  '  (2/)\  Since  there  are  4l  possible  sequences  (b(j))^Lx  until 
time  £,  the  probability  that  the  first  0  occurs  at  time  t  is 

(2t\ 

(t+i)4<-  By  Stirling’s  formula  (2t)  ~  ^  and  the  expected 

/2*\ 

value  for  the  first  occurence  of  a  0,  (t+i)4<  ' t  does  not 

exist,  since  the  single  summands  are  about  y/i. 

If  the  size  of  the  memory  is  limited  by  some  con¬ 
stant  k ,  the  probability  for  the  occurence  of  the  first  0  at  time 
t  {s  B ,  where  a(0,  t  —  1)  is  the  number  of  all  sequences 

produced  by  the  two  sources  leading  to  the  all-one  sequence  of 
bits  transmitted  by  the  organizer  with  memory  size  0  at  time 
t  -  1. 

Analogously,  a(m,  t)  is  defined  for  every  time  t  =  1,2,.., 
and  memory  size  m  =  0, . . . ,  k.  In  each  time  unit  the  source 
outputs  01  and  10  do  not  change  the  size  of  the  memory,  00 
decreases  the  memory  by  one  bit  (and  is  forbidden  for  m  =  0), 
and  11  increases  the  memory  size  by  one  bit  if  rrt  <  k  (and 
does  not  change  the  memory  if  m  =  k).  So  we  obtain  recur¬ 
sion  formulae  for  the  numbers  a(m,  t)  which  can  be  written  in 
matrix  form  as 


/  a(0,  t)  \ 

=  Ak  • 

a(o,  t  —  1) 

*(M)  / 

v  a(k, t  -  l) 

/  2  1 

0  ... 

0  0  0  \ 

1  2 

1  ... 

0  0  0 

0  0 

0  ... 

12  1 

\  0  0 

0  ... 

0  13/ 

where  Ak  = 


The  behaviour  of  a(0,  t)  is  essentially  determined  by  the 
largest  eigenvalue  of  the  matrix  Ak  which  can  be  calculated 
to  be  4-cos2(v^n). 

So  for  size  of  memory  bounded  by  k  =  0, 1, . . .,  we  obtain 
a  sequence  of  expected  values  for  the  occurence  of  the  first  0 
(Ek)kLo  with 


]>>os2( 

t=l 


1 


4k  4- 6 


t  ■  -  <  oo 
4 


In  the  special  case  k  =  1  it  turns  out  that  a(0,  t)  =  5f  ♦  ftt, 
where  denotes  the  2£-th  Fibonacci  number. 

Since  the  sequence  (£/c)£L0  is  divergent,  it  is  immediate 
that  the  expected  value  for  the  occurence  of  the  first  0  in  the 
sequence  of  bits  transmitted  by  the  organizer  does  not  exist, 
if  the  size  of  the  memory  is  bounded  by  a  function  f(t) 
which  exceeds  every  k  >  0  from  some  to  on. 

One  might  also  consider  the  general  case  in  which  in  each 
time  unit  $  letters  from  a  finite  alphabet  arrive  at  the  channel 
and  t  <  s  letters  have  to  be  transmitted.  For  s  =  t  —  1  and 
constant  memory  size  this  model  has  been  discussed  (under 
different  aspects)  in  [1]  (see  also  [2]). 
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Abstract  — ■  We  have  shown  that  if  one  invests  in  the 
outcome  of  a  random  variable  A,  where  investment 
consists  of  gambling  at  any  odds,  then  every  bit  of 
description  of  X  increases  the  doubling  rate  by  one 
bit.  However,  if  the  provider  of  the  information  has 
access  only  to  V,  a  random  variable  jointly  distributed 
with  Ar,  then  this  maximal  efficiency  is  not  generally 
possible.  We  find  the  increase  A (R)  in  doubling  rate 
for  a  description  of  V  at  rate  R  for  the  jointly  Gaussian 
and  jointly  binary  cases.  We  investigate  the  extension 
to  multivariate  Gaussian  random  variables.  We  prove 
a  general  result  for  the  derivative  of  A (R)  at  R  =  0. 

We  then  consider  the  problem  in  which  there  are  k 
separate  encoders  and  each  observes  a  random  vari¬ 
able  V{  correlated  with  X.  We  find  how  efficiently 
these  encoders,  without  cooperation,  help  the  investor 
who  is  interested  in  X. 

Summary 

Suppose  one  gambles  on  the  outcome  of  a  random  variable  X. 
The  investor  distributes  his  wealth  according  to  b(x)  and  the 
investment  pays  odds  of  o(x)  for  one.  Also  suppose  that  the 
description  of  another  random  variable  V,  which  has  a  known 
joint  distribution  with  X,  at  the  rate  of  R  bits  is  allowed.  Let 
A (R)  be  the  maximum  increase  in  the  doubling  rate  from  no 
description  to  a  description  of  rate  R.  It  can  be  seen  that 
A (R)  is  a  concave  and  nondecreasing  function  of  R.  We  can 
show  [2]  that 

A(R)  =  max  I(V\X). 

p(t>|v,z):  J(V\V)<Rt  v->v->x 

We  define  initial  efficiency  as  the  derivative  of  A  (R)  at  the 
origin.  Initial  efficiency  is  the  maximum  possible  increase  in 
A (R)  per  bit  of  description.  For  V  =  X,  A (R)  =  R ;  hence 
the  efficiency  is  1.  However,  for  a  general  V,  the  efficiency  is 
generally  less  than  1.  We  find  A  (R)  and  examine  the  efficiency 
of  the  jointly  binary  and  Gaussian  cases. 

Theorem  1  Suppose  V  and  X  are  both  Bernoulli^ )  ran¬ 
dom  variables  associated  by  a  binary  symmetric  channel  with 
crossover  probability  p.  The  A (R)  curve  is  given  by 

(R,  A (R))  =  (1  -  /i(a),  1  -  h(a  *  p)) 

where  0  <  a  <  1,  h  is  the  binary  entropy  function  and  *  is 
the  cascade  operation. 

We  use  a  lemma  by  Wyner  and  Ziv,  known  as  ‘Mrs.  Ger¬ 
ber  s  Lemma’  [4]  to  prove  the  optimality  of  the  descriptions 
in  the  above  theorem.  The  initial  efficiency  can  be  calculated 
as  (1  —  2 pY  . 


JThis  work  was  supported  by  NSF  Grant  NCR-9205663,  ARPA 
Contract  J-FBI-94-218  and  JSEP  Contract  DAAH04-94-G-0058. 


Theorem  2  Suppose  X  and  V  are  jointly  Gaussian  with  cor¬ 
relation  p.  Then 

MR)  =  2  lo§h  _p2(!  _2-2R))- 

The  proof  of  optimality  in  the  Gaussian  problem  requires 
a  lemma  by  Bergmans,  which  is  a  conditional  version  of  the 
entropy  power  inequality  [1].  We  note  that  the  initial  efficiency 
is  p2 . 

A  natural  generalization  of  this  theorem  is  to  multivariate 
Gaussian.  Suppose  Vn  ~  N(0,Kv),  Zn  "  N(0,Kz),  Vn  and 
Zn  are  independent  and  Xn  =  Vn  4-  Zn .  By  changing  the  co¬ 
ordinate  system  ,  we  can  obtain  diagonal  covariance  matrices 
and  hence  transform  the  problem  to  one  on  parallel  subchan¬ 
nels  with  a  total  rate  constraint.  The  solution  is  given  by 
water-filling  in  the  entropy  domain.  We  distribute  the  total 
rate  so  that  the  derivative  of  A  (R)  with  respect  to  R  at  the 
operating  point  is  the  same  for  all  the  subchannels  used. 

We  note  that  in  all  the  problems  examined,  the  initial 
efficiency  is  related  to  the  correlation  between  V  and  X. 
We  define  the  maximal  correlation  between  V  and  X  as  the 
supremum  of  Ef(X)g(V),  where  the  supremum  is  over  all 
functions  /  and  g  such  that  Ef(X)  =  Eg(V)  =  0  and 
Ef2(X)  =  Eg2(V)  —  1.  Maximal  correlation  depends  only 
on  the  joint  distribution  of  V  and  X  and  is  independent  of 
the  actual  labeling.  Conditions  under  which  the  maximal  cor¬ 
relation  can  be  attained  have  been  investigated  by  Renyi  [3]. 
Our  next  theorem  examines  the  relationship  between  the  ini¬ 
tial  efficiency  and  maximal  correlation. 

Theorem  3  Initial  efficiency  is  equal  to  the  square  of  the 
maximal  correlation  between  V  and  X. 

Next  we  consider  k  separate  senders.  We  are  interested  in 
the  increase  in  the  doubling  rate,  A,  for  gambling  on  X  wffien 
sender  i  observes  Vi  correlated  with  X  and  the  senders  operate 
at  respective  rates  Ri , . . . ,  Rk  ■  We  prove  an  achievable  region 
for  (Ri, . . . ,  Rk,  A),  and  show  that  a  Slepian-Wolf  type  of  rate 
region  is  achievable  for  this  investment  problem. 
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Abstract  —  In  a  K- way  minimization  problem,  we 
are  interested  in  finding 

min  •••  min  f{x 

XK  €sk 

where  /  is  continuous  and  bounded  from  below,  and 
Si  is  a  compact  convex  set  in  IRni,  1  <  i  <  K,  In  a 
paper  by  Csiszar  and  Tusnady  [2],  a  similar  problem 
with  somewhat  less  stringent  conditions  was  studied 
for  K  =  2,  where  it  was  shown  that  an  alternating 
minimization  algorithm  converges  to  the  infimum  pro¬ 
vided  a  certain  geometric  condition  is  satisfied.  In 
this  paper,  we  take  an  approach  (also  with  strong  ge¬ 
ometric  flavor)  different  from  theirs,  which  enables 
us  to  obtain  a  sufficient  condition  for  an  alternating 
minimization  algorithm  to  converge  to  the  minima. 
In  particular,  we  show  that  it  is  sufficient  for  f  to 
be  convex.  The  Arimoto-Blahut  algorithm  for  com¬ 
puting  channel  capacity  is  discussed  as  an  example  of 
application  of  our  results. 

I.  An  Alternating  Minimization  Algorithm 

In  a  if- way  minimization  problem,  we  are  interested  in 
f*=  min  min  f(xXi . . . ,  xK)> 

*i€S'i  xk€sk 

where  Si  is  a  subset  of  JRni,  1  <  t  <  Hf.  Here  Xi  is  an  ni- 
tuple.  We  assume  that  Si  is  compact  and  convex,  and  /  is 
continuous  and  bounded  from  below.  Let  x  =  (®i, . . . ,  xk )  be 
a  generic  point  in  Il^=1Sj.  For  each  x,  define  for  1  <  i  <  K 

Xi  (x)  =  E  Si 

such  that  z*(x)  achieves 

min  f(x i, . . . ,  zt'_i,  y,  Zi+i ,  •  •  ■  ,  %k) 

when  ®i, . . . ,  Xi- i,  ,  xk  are  fixed,  and  let 

9i(x)  —  (^l)  -  ■  •  j  — 1,  (x),  zt-fl  j  ■  ■  ■  j  ZJif)- 

Let  ^(x)  =  £;*(x),  where  1  <  i*  <  K  and 

/(Si*(x))  =  ^min^ /(a.(x)), 

and  define 

A/(x)  =  /(x)  -  f(g(x)). 

Since  /(x)  >  /(^(x))  for  1  <  i  <  K ,  A/(x)  >  0. 

Let  xo  be  any  point  in  TLf=1Sj}  and  x*  =  g(xk-i)  for  k  >  1. 
This  paper  is  devoted  to  study  of  this  “greedy”  alternating 
minimization  algorithm.  We  show  that,  under  suitable  condi¬ 
tions,  /(xjt)  — *  /*  as  k  — +  oo.  Henceforth,  we  will  abbreviate 
/(xfc)  to  fk. 
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II.  Sufficient  Conditions  for  Convergence 

Since  fk  is  non-increasing  and  /  is  bounded  from  below,  fk 
must  converge  to  some  value.  We  now  state  a  condition  that 
is  sufficient  for  fk  f*- 

(sc-i)  Let  X*  =  ,x*K)  e  n£=1s, 

achieves  /*.  For  any  x  =  («i, . . . ,  xk )  E  ILf-iSj 
such  that  /(x)  >  /*,  there  exists  y  which  is  a  con¬ 
vex  combination  of  x*  and  Xi  for  some  1  <  i  <  K 
(y  E  Si  since  x\,xi  G  Si  and  Si  is  convex)  such 
that 

/(*  . . *i+i, . . . ,  xk)  <  f{x 

It  is  not  difficult  to  show  that  if  (SC-1)  is  satisfied,  then 
A/(x)  >  0  whenever  /(x)  >  /*.  Therefore,  the  algorithm 
cannot  be  trapped  at  a  local  minimum.  Using  the  assumption 
that  /  is  continuous  and  that  Sj,  1  <  j  <  K  is  compact,  it 
can  be  shown  that  fk  always  converges  to  /*. 

We  have  further  proved  that  (SC-l)  is  satisfied  if  /  is  convex 
in  ®i, . . . ,  xk-  This  condition  is  stronger  than  (SC-1),  but  it 
has  the  advantage  that  it  is  easy  to  check.  In  the  next  section, 
we  will  show  how  this  condition  can  be  used  to  prove  the 
convergence  of  the  Arimoto-Blahut  algorithm  for  computing 
channel  capacity. 

III.  An  Example  of  Application 

Let  {Q(fc|.;)}  be  the  set  of  transition  probabilities  of  a  channel. 
Then  the  channel  capacity  is  given  by 

maxmax^^p(j)<3(*U)lo8 

j  h 

(see  Blahut  [1]),  which  is  equivalent  to  the  negative  of 

T  t  E  E  ptiww)  los  |jl)- 

3  * 

Let 

/(p,q)  =  yyy- 

j  k 

The  Arimoto-Blahut  algorithm  is  a  special  case  of  the  algo¬ 
rithm  described  in  Section  1  with  K  ~  2;  it  is  easy  to  check 
that  all  the  required  conditions  are  satisfied.  Using  the  results 
in  Section  II,  in  order  to  show  that  the  algorithm  converges  to 
the  channel  capacity,  we  only  have  to  show  that  /  is  convex 
in  both  p  and  q.  This  can  be  done  by  invoking  the  log-sum 
inequality  on  p.  29  of  Cover  and  Thomas’  textbook  [3].  So, 
the  algorithm  does  converge  to  the  channel  capacity. 
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I.  Introduction 

We  study  the  properties  of  a  sequence  of  dependent  random 
variables  generated  with  the  following  scheme.  A  sequence  of 
independent  identically  distributed  random  variables  a  arrives 
at  the  input  of  a  fc-register;  the  random  variables  take  values 
0  and  1  with  probabilities  not  equal  to  1/2.  The  sequence 
77  of  random  variables  generated  by  the  fc-register  for  fc  >  2 
is  a  stationary  random  sequence  with  dependent  components. 
The  sequence  77  is  taken  as  the  input  to  a  memoryless  binary 
symmetrical  channel  with  input-independent  noise,  i.e.  77  is 
added  coordinatewise  modulo  2  to  a  sequence  ft  of  indepen¬ 
dent  identically  distributed  random  variables  that  also  take 
the  values  0  and  1  with  probabilities  not  equal  to  1/2  and  are 
independent  with  the  sequence  a. 

In  this  paper  we  derive  upper  and  lower  estimates  for  the 
entropy  of  the  stationary  non-Markov  random  source  identi¬ 
fied  with  the  channel  output.  The  upper  estimate  is  based  on 
the  well-known  subadditivity  property  [1]  of  the  entropy  of  a 
finite-dimensional  distribution.  The  main  result  is  the  proof 
of  nontrivial  lower  estimate  of  the  entropy  for  two  particular 
fc-registers:  fc  =  2  and  fc  =  3.  If  the  probabilities  of  0  and  1 
in  the  sequences  a  and  (3  are  close  to  1/2,  this  estimate  shows 
that  the  entropy  of  the  source  increases  when  fc  grows  from  1 
to  3.  Pre-transformation  of  a  by  the  fc-register,  fc  >  1,  yields 
the  increase  of  the  entropy  of  the  additive  source  a  -j-  fi  over 
the  case  fc  =  1.  This  property  of  increase  of  the  entropy  is  sig¬ 
nificant  for  constructing  a  strong  random  source  from  several 
“weak”  ones. 


II.  Results 

Let  a  and  6,  0  <  a,  6  <  1,  be  real  numbers.  By  c*i,  j3i  for 
i  =  1,2,...,  we  denote  independent  random  variables  that 
take  the  values  0  and  1  with  probabilities 


P {&i  -  x} 


P  {Pi  = 


1  +  a(  — l)x 
2 

1 +  &(-!)* 
2 


x  =  0, 1. 


Take  an  integer  fc  >  1  and  consider  a  stationary  discrete  ran¬ 
dom  source  77^  =  {p[k\v2k\  •  •  where  rj^k\  i  >  1,  take  the 
values  0  or  1  and  are  defined  as 


fc-i 

Vik)  =  Pi  +  ou+j  mod  2  (1) 

j=o 


The  symbol 

Q[a'b)(x{n))  =  P{r?(fc)(n)  =  x(n)} 

denotes  the  finite-dimensional  distribution  of  source  (1).  The 
entropy  of  source  (1)  is  defined  as 

Hk(a,b)  =  lim  - H (»?(fc) (n)) , 

n — >oo  71 
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where 


H(ri(k){n))  =  -  y^Q<“’i,)(o;(n))lnQ‘:“’!’)(x(n)) 

xn 

is  the  entropy  of  finite-dimensional  distribution. 

We  define  a  binary  entropy  as 

h(8)  =  —6  In  <5  —  (1  ~  6)  ln(l  —  6). 


Theorem  X.  For  any  fc  >  3 


Hk(a,b)<  min  (h  ' 


Theorem  2.  For  k  —  2, 


H2(a,b)  >  In 2  —  In  I  1  + 


1  —  b2 


Theorem  3.  For  fc  =  3, 


Hs(a,b)  >  In  2  — In  1  + 


a464  +  a6fc2(l 


(1  —  a262)2 


Theorem  4.  Assume  that  the  probabilities  of  0  and  1  in 
sequences  a  and  j3  are  close  to  1/2,  i.e.,  the  parameters  a  and 
b  are  close  to  zero.  Then  for  fc  =  1,2,3  the  entropy  Hk{a-,b) 
increases  with  the  growth  of  fc. 
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Abstract  —  Various  different  definitions  were  inves¬ 
tigated  in  Random  Multiple  Access  theory  for  capac¬ 
ity  of  the  multiple- access  collision  channel.  However, 
as  it  was  pointed  out  by  Tsybakov  [4],  almost  nothing 
about  the  relations  between  the  various  definitions  is 
known.  In  this  paper  we  try  to  fullfll  this  gap  showing 
about  some  widely  used  capacity  definitions  that  they 
are  equivalent. 


I.  Introduction 

The  study  of  collision  channels,  also  called  random-access 
channels,  started  with  Abramson’s  ALOHA  system  [l]  which 
uses  only  binary  feedback  (collision/no  collision).  Later  on 
this  channel  model  (and  its  modifications)  became  a  special  in¬ 
terest:  it  was  investigated  in  numerous  research  articles.  The 
goal  of  all  such  papers  is  to  present  good  conflict  resolution 
algorithms  and  to  get  bounds  on  the  efficiency  of  the  best  pos¬ 
sible  ones.  For  this  reason  one  have  somehow  to  measure  how 
efficient  an  algorithm  is.  But  different  authors  measured  this 
quite  often  in  different  ways,  getting  by  this  different  defini¬ 
tions  for  the  throughput  of  an  RMA  algorithm  which  is  nothing 
else  as  one  of  these  measures.  This  led  to  the  study  of  differ¬ 
ent  capacity  notions,  since  it  is,  raughly  speaking,  the  best 
possible  throughput  which  might  be  achieved.  On  the  other 
hand  it  is  not  obvious  at  all,  that  an  algorithm  beeing  efficient 
(i.e.  having  a  high  throughput)  in  one  sense,  is  also  efficient 
from  another  point  of  view  as  well.  In  [4]  Tsybakov  gave  an 
excellent  survey  about  the  Random  Multiple-Access  commu¬ 
nication,  where  he  wrote  about  this  problem  that  “...  we  know 
almost  nothing  about  the  relations  between  the  various  defi¬ 
nitions  of  delay,  throughput  and  capacity”.  In  this  paper  we 
will  show,  that  some  of  the  most  widely  used  definitions  for 
the  throughput  and  capacity  of  the  multiple  access  collision 
channel  are  equivalent  in  the  case  when  the  feedback  is  the 
multiplicity  of  the  collisions. 

II.  Summary  of  Results 

Assume  that  x\  <  X2  <  X3  <  ...  is  a  random  process  where 
Xi  is  the  generating  time  of  the  ith  packet.  We  will  suppose 
that  the  instants  of  new-packet  generations  form  a  Poisson 
process,  i.  e.  the  differences  (x;+i  —  x,)  are  independent 
random  variables  with  the  identical  distribution 

Pr{xi+i  -  Xi  >  x}  =  e~Xx. 

A  conflict  resolution  protocol  (or  random  multiple  access 
algorithm)  is  a  retransmission  algorithm  /  for  the  packets  in  a 
collision.  The  delay  6  of  a  packet  is  the  time  from  its  moment 
of  generation  until  the  moment  of  its  successful  transmission. 
Let  6i  denote  the  delay  of  the  ith  packet.  The  delay  of  a 
random  multiple  access  algorithm  /  is 

Df  —  lim  sup  E(£;), 

t — 00 
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where  E()  denotes  expectation,  and  its  throughput  is 
Rj  =  sup{A  :  Dj  <  00}. 


In  the  case  of  blocked  access  the  number  of  active  users  in 
subsequent  epochs  forms  a  Markov  chain  M.  This  implies 
the  following  definition  for  the  throughput  of  a  blocked  access 
algorithm: 

R)  =  sup  {A  :  M  is  stable}. 

It  is  very  natural  to  define  the  throughput  as  fraction  of 
the  number  of  generated  messages  and  the  time  is  needed  to 
transmit  them.  More  precisely,  denote  by  X(t)  the  number  of 
generated  messages  in  the  time  interval  (0,  t)  and  by  7(X(t)) 
the  number  of  steps  which  are  needed  to  transmit  these  mes¬ 
sages.  Thus  the  throughput  might  be  also  defined  [2]  as 


Rf 


=  lim  inf 

t—»  00 


E  X(t) 

E(t  (*«))' 


The  above  listed  three  different  throughput  notions  imply 
three  different  capacity  definitions  in  the  following  way.  Let 
A  denote  the  set  of  random  multiple  access  algorithms.  The 
capacity  of  the  random  multiple  access  collision  channel  is  de¬ 
fined  as 

C  =  sup{R/  :  /  6  .4}, 

which  supremum  can  be  taken  over  the  different  throughputs 
defined  before.  Thus  we  get  C1,  C2,  and  C3,  resp. 

In  1981  Pippenger  proved  in  probabilistic  way  [2],  that 
there  exist  an  algorithm  /,  for  which  R)  =  1.  Ruszinko  and 
Vanroose  [3]  constructed  such  an  algorithm.  Let  us  denote 
this  algorithm  by  RV.  We  claim  that  the  following  statement 
holds. 


Theorem. 


R\RV)  =  R2(RV)  =  R3(RV)  =  1, 

thus 

c1  =c2  =  c3  =  1. 

Consequently  these  throughput  and  capacity  definitions  are 
equivalent. 
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Abstract  —  We  derive  upper  and  lower  bounds  on 
the  number  of  all  variants  a  rectangular  M  x  N  matrix 
can  be  partitioned  into  fragments.  Next  the  problem 
of  matrix  partitioning  is  considered  as  a  particular 
example  of  a  more  general  problem  of  constructing 
two-dimensional  Markov  processes  (fields)  on  discrete 
rectangular  latices.  We  discuss  a  matrix-  theoretical 
approach  to  the  problem  to  explore  the  structure  of 
discrete  fields  defined  by  a  given  matrix  of  local  in¬ 
teraction. 

In  this  paper  we  continue  to  study  the  problem  formulated 
in  [1].  Let  <j)(M;N)  be  a  number  of  all  variants  an  M  x  N 
rectangular  matrix  (with  empty  cells)  can  be  splitted  into  frag¬ 
ments.  Immediate  calculations  show  that 

4>( 2;  2)  =  12,  <f>{ 2;  3)  =  74,  </>(3;  3)  =  1442,  <f>( 4;  4)  >  1.7  x  106 


Then  we  prove  the  main  result  of  the  paper  establishing  an 
exact  exponential  behavior  of  <f>(M;N).  Let 

R(oo]  oo)  =  lim  R(M ;  N) 

M,N— >oo 

be  asymptotical  information  rate  of  the  form  source. 

Theorem  2. 

R(  oo;oo)  =  =  0.8322611, 

where  A  =  3.1700865  is  maximal  eigenvalue  of  the  matrix 


/  Aoo 

Aoi  \ 

/  1  0  0  1  \ 
0  111 

\  Aw 

An  ) 

0111 

1  1111  / 

V  mi  / 


and  so  on.  Each  individual  partition  of  the  matrix  is  consid¬ 
ered  as  an  output  of  a  block  source  with  block  size  2 MN  — 
M  —  N  and  information  rate 


R(M;N) 


log2  <KM-,N) 

2  MN-M-N' 


The  rate  is  measured  in  ’’bits  per  edge”,  because  the  denom¬ 
inator  of  R(M\N)  is  the  total  number  of  all  internal  edges 
between  the  cells  of  the  matrix.  So  defined  source  is  called 
form  source,  where  ’’form”  means  the  set  of  contours  result¬ 
ing  from  an  individual  matrix  partition  [2,3,4].  The  exponen¬ 
tial  behavior  of  <j)(M;N)  may  present  an  interest  in  image 
processing  [2],  statistical  mechanics  [3]  and  other  applications 
exploiting  different  models  based  on  the  conception  of  ran¬ 
dom  fields.  In  [4]  a  special  technique  founded  on  the  theory  of 
Fibonacci  numbers  was  suggested  and  some  upper  and  lower 
bounds  on  )  were  obtained. 

Now  we  develop  a  formal  matrix  approach  to  the  prob¬ 
lem.  This  approach  is  based  on  Perron-Frobenius  theory  for 
nonnegative  matrices  [5].  We  introduce  special  (0,l)-matrices 
describing  a  physical  process  of  breaking  down  of  an  M  x  N 
matrix  into  fragments.  This  four  ’’splitting  matrices”  are 


Aqo 


1  0 
0  1 


,  Aqi  =  Aio 


They  define  the  splits  allowed  to  run  through  a  solid  state 
matrix  with  unbreakable  cells.  In  these  terms  we  prove 
Theorem  1.  For  any  integers  M,  N  =  1,2,... 

4>{M  +  1;  AT  +  1)  =  ||  [|| AjiVjjv  ||]M  ||  , 

where 

N 

A{NjN  =  JJ  Ainjn,iN,  jN  £{0,1}^, 

71  =  1 

and  ||.||  denotes  the  sum  of  all  matrix  elements. 


The  second  part  of  the  paper  is  devoted  to  a  probabilistic 
modification  of  the  problem.  We  consider  an  4  x  4  stochas¬ 
tic  ’’splitting  matrix”  with  the  same  zero  elements  as  in  4  x  4 
matrix  shown  in  Theorem  2  and  with  arbitrary  positive  proba¬ 
bilities  substituted  instead  of  ones.  We  show  that  the  maximal 
entropy  rate  of  the  so  defined  probabilistic  form  sou  asymptot¬ 
ically  coincides  with  the  information  rate  of  the  deterministic 
form  source  shown  in  Theorem  2. 

The  4x4  matrix  shown  in  Theorem  2  reflects  the  demand 
of  continuity  of  a  contour.  We  present  a  more  general  form  of 
lower  and  upper  bounds  on  <f>(M ;  N)  defined  by  an  arbitrarily 
given  matrix  of  the  local  interaction  [6]  in  the  field  and  show 
some  results  of  computational  experiments. 

Finally  some  possible  schemes  of  the  form  source  coding 
are  discussed. 
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I.  Introduction 

During  the  last  decades,  quite  new  experimental 
approaches  to  the  study  of  communication  systems  of 
animals  have  been  developed  including  those  based  on  a 
direct  dialog  with  animals  taught  artificial  intermediary 
languages.  The  use  of  simple  grammatical  rules  as  well 
as  number  -  related  skills  at  a  level  of  pre  -  school 
children  have  been  demonstrated  in  chimpanzee  [1]  and 
in  grey  parrot  [2].  However,  the  question  of  existence 
of  a  developed  natural  language  in  social  animals  is 
still  open  for  discussion.  We  have  been  suggested  quite 
a  different  approach  to  the  study  of  communication 
systems  based  on  the  ideas  of  the  Information  Theory.Our 
recent  experiments  allowed  to  evidence  the  presence 
of  potentially  unlimited  number  of  messages  in  ant  ” 
language”,  and  to  show  ants  as  being  able  to  use  the 
’’text”  regularities  for  information  compression  [3,4].  In 
this  report  we  consider  plasticity  of  ant  language  as  well 
as  their  numerical  competence. 

II.  Methods 

Ants  were  kept  in  transparent  nests  in  the  laboratory 
arenas.  Each  worker  was  labelled  with  an  individual 
colour  mark.  As  soon  as  discovering  ants  found  food,  they 
informed  the  relatively  constant  teams  of  5  -  8  foragers 
about  it.  During  experiments  ants  were  fed  in  setups, 
consisted  of  a  long  ”  trunk  ”  with  equally  spaced  25-  40 
branches,  made  of  thin  plastic  sticks. Each  branch  ended 
in  an  empty  trough,  except  for  one  filled  with  syrup. To 
start  the  experiment,  an  ant  scout  was  placed  at  the 
trough  containing  food.  When  it  returned  to  the  nest, the 
duration  of  the  contact  between  foragers  and  the  scout  was 
measured.  As  soon  as  foragers  began  following  the  scout, 
the  scout  was  removed  from  the  arena  with  tweezers.  To 
avoid  odour  tracks, the  original  maze  was  replaced  by  an 
identical  one. 

III.  Ant  Numerical  Competence  and  Plasticity 
of  Ant  ’’Language” 

It  turned  out  that  ants  can  count  within  several  tens,  and 
that  in  their  ”  language”  there  are  means  of  transmitting 
messages  about  the  number  of  objects.  In  all  experiments 
the  teams  abandoned  the  nest  after  they  were  contacted 
and  moved  towards  troughs  130  times.  In  90  cases  the 
team  immediately  found  the  correct  way.  The  probability 
of  finding  the  food-containing  trough  randomly  is  less 
than  10-10.  The  relation  between  the  number  i  of  the 
branch  and  the  duration  t  of  the  contact  between  scout 


and  foragers  is  linear  and  described  by  the  equation 
t  —  ai  4-  6. Note  that  in  modern  human  languages  with 
decimal  numeration  the  length  of  the  written  form  of 
a  number  i  and  the  time  to  pronounce  the  number  i 
are  proportional  to  logi ,  but  not  to  i.  Archaic  human 
languages  are  known  to  have  used  another  system  of 
numeration.  The  number  ’’one”  was  encoded  as  the  word 
’’finger”,  ’’two”  as  ’’finger,  finger”  ,  etc.  In  this  case  ,  the 
time  required  to  pronounce  i  is  also  proportional  to  i,  as  in 
ant  ’’language”.  Such  a  large  difference  between  modern 
human  and  ant  languages  does  not  necessarily  show  that 
the  latter  is  primitive;  as  it  is  known  that  in  a  ’’reasonable” 
language  the  length  of  a  word  should  correspond  to  its 
frequency  of  occurrence  in  communications.  We  then 
consider  to  which  extent  may  ant  ’’language”  transform 
to  keep  this  equation  valid.  In  special  experiments  a 
horizontal  trunk  with  40  branches  was  used,  however, the 
trough  was  placed  on  different  branches  with  different 
frequencies:  on  two  ’’special”  branches  (  N  10  and  N  20  )  it 
appeared  in  about  2  cases  of  3.  At  first  the  time  required 
to  transmit  information  on  the  number  i  of  the  branch 
i  was  proportional  to  i.  But  about  halfway  through  the 
series,  the  time  of  transmission  of  information  about  the 
fact  that  the  trough  was  on  a  ’’special”  branch  became 
much  shorter  than  in  the  cases  when  the  trough  was  on 
other,  seldom  used  branches.  It  should  be  emphasized 
that  the  time  ceased  to  be  proportional  to  i,  perhaps  as  a 
result  of  a  transformation  in  the  communication  system  of 
these  ants,  caused  by  a  change  in  ’’numerical”  frequency. 
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I.  Introduction  and  Algorithms 

Let  A  be  an  abstract  source  alphabet  and  A  a  finite  repro¬ 
duction  alphabet.  If  x  =  (x,)  is  a  finite  or  infinite  sequence 
of  symbols  from  A  or  i  (or  of  random  variables  taking  their 
values  in  these  sets),  let  x”  =  (xm,  •••,£„)  and,  for  sim¬ 
plicity,  write  x"  as  xn .  VVe  denote  the  set  of  all  n-tuples 
drawn  from  A(A)  by  An(An).  A  lossless  codeword  length 
function(LCLF)  /  is  a  map  from  A*,  the  set  of  all  finite  se¬ 
quences  from  A,  to  {1,2,**-,}  satisfying  2“^  <  1  for 

any  71  >  0.  Clearly,  there  exists  a  one  to  one  correspondence 
between  lossless  codeword  length  functions  and  prefix  codes: 
for  any  LCLF  /,  there  exists  a  prefix  code  </)  :  A*  — ►  {0,1}* 
such  that  for  any  y  £  A*,  l(y)  —  the  length  of  </>(</),  and  vice 
versa.  The  well-known  examples  are  the  Lempel-Ziv  codeword 
length  function  LZ(y11}  and  the  k- th  order  arithmetic  code¬ 
word  length  function  LA,k(yn)-  bet,  p  :  A  x  A  —  [0, +oo) 
be  a  single-letter  distortion  measure.  For  any  stationary, 
ergodic  source  //,,  let  R(D,y)  and  D(R,p)  denote  its  rate 
distortion  function  and  distortion  rate  function  with  respect 
to  the  fidelity  criterion  {pT,}  generated  by  p ,  respectively, 
where  pn(xtl,yn)  —  n.-1  p(x,,  yt)  for  any  xn  £  An  and 

yn  £  An .  For  simplicity,  we  shall  assume  that  a  reference  let¬ 
ter  a*  £  A  exists  for  p  and  //.  such  that  Ep(X\,  a*)  <  oo  and 
that  supl€/1  inf^  p{x,  y)  =  0. 

Corresponding  to  any  LCLF  /,  three  universal  lossy  data 
compression  schemes  are  presented  in  this  paper:  one  is  with 
fixed  rate,  another  is  with  fixed  distortion,  and  a  third  is  with 
fixed  slope. 

A  fixed  rate  universal  lossy  data  compression  scheme.  Fix 
R  >  0.  Let  N(R,l)  be  the  smallest  positive  integer  such 
that  the  set  {ytl  £  An  :  l(yn)  <  nR}  is  nonempty  for  all 
n  >  N(R,  l).  Let  Bn{l)(n  >  N{R,l))  consist  of  all  yn  £  An 
such  that  l(yn)  <  nR.  In  our  fixed  rate  universal  lossy  data 
compression  scheme,  each  source  sequence  xn  £  An  is  quan¬ 
tized  into  a  closest  member  yn  of  £?7}(/).  There  are  two  dif¬ 
ferent  ways  for  the  encoder  to  encode  xn:  (1)  The  encoder 
can  transmit  the  index  of  y n  in  Bn(l)  using  a  binary  string  of 
length  or  (2)  the  encoder  can  transmit  the  binary  code¬ 

word  associated  with  yn  via  the  LCLF  /,  adding  some  dummy 
digits  to  ensure  overall  codeword  length  [n^J- 

A  fixed  distortion  universal  lossy  data  compression  scheme. 
Fix  D  >  0.  For  each  n  >  1,  we  think  of  the  entire  set  An  as  a 
codebook  of  dimension  n  and  list  the  elements  yn  of  An  in  or¬ 
der  of  nondecreasing  lossless  codeword  length  l(y11).  For  each 
x"  £  An,  the  encoder  maps  xTl  into  the  binary  codeword  as¬ 
sociated  with  y11  via  the  LCLF  /,  where  yn  is  the  first  element 
in  An  such  that  pn(xn,yn)  <  D. 
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A  fixed  slope  universal  lossy  data  compression  scheme.  Let 
A  >  0  be  fixed.  Our  fixed  slope  universal  lossy  data  compres¬ 
sion  scheme  works  as  follows:  For  each  xn  £  An,  the  encoder 
first  searches  the  first  element  yn  in  An  which  minimizes  the 
cost  function  n~1l(ytl)  +  A pn(xn,yn)  over  the  whole  set  An, 
where  An  is  assumed  to  be  ordered  in  some  order,  and  then  en¬ 
codes  xn  into  the  binary  codeword  associated  with  yn.  After 
receiving  the  binary  codeword,  the  decoder  can  completely 
recover  y n  and  output  yn  as  a  reproduction  of  xn .  In  this 
way,  the  resulting  rate  rn(xn,/,A)  in  bits  per  sample  is  then 
n~ll(yn);  and  the  resulting  distortion  pn(xn,/,  A)  per  sample 
is  pn(xn,yn). 

II.  Optimality 

The  fixed  rate  or  fixed  distortion  lossy  data  compression  al¬ 
gorithm  mentioned  above  is  just  the  extension  of  the  corre¬ 
sponding  one  in  [l]  to  the  case  of  any  LCLF.  Under  some 
mild  conditions  on  /,  similar  results  to  [l]  can  be  proved.  In 
the  following,  therefore,  we  focus  only  on  the  fixed  slope  lossy 
data  compression  algorithm. 

A  LCLF  /  is  said  to  satisfy  Condition  A  if  for  any  stationary, 
ergodic  process  {Yi}f°  taking  values  in  A,  ?i~1l(Yn)  converges 
with  probability  one  to  the  entropy  rate  of 
Theorem  1  Let  A  >  0.  Let  p  be  a  stationary,  ergodic  source 
with  the  random  output  X  =  If  l  satisfies  Condition 

A,  then  as  n  — >■  oo, 

(i)  rn(A'",/,A)  Y  \pn(Xn ,1,  \)  -  Rx(p)  +  XDxM  almost 

surely,  where  D\[g)  =  h\{{D\D  >  0,  R+{D,g)  >  —A} 
and  Rx(p)  =  R(D\{p),  p). 

(ii)  rn{Xn ,1,  \)(pn(Xn ,1,  \))  converges  al¬ 

most  surely  to  R\(p)(D\(p)),  provided  (D\(p),  R\(p)) 
is  the  only  point  on  the  rate  distortion  curve  such  that 
R_(D\(p),p)  <  -A  <  R+(D\(p),p). 

Particularly,  Theorem  1  holds  for  the  k- th  order  arithmetic 
codeword  length  function  LA,k{ i-  e.  ,  /  =  La,u)  if  k  is  allowed 
to  go  to  infinity.  During  the  process  of  proving  Theorem  1,  we 
also  obtain  a  very  strong  sample  converse  theorem  for  variable 
length  source  coding  which  implies  Kieffer’s  sample  converse 
theorem  and  strong  converse  theorem  as  corollaries. 

The  main  advantage  of  this  fixed  slope  universal  lossy  data 
compression  scheme  over  the  fixed  rate(fixed  distortion)  uni¬ 
versal  lossy  data  compression  scheme  lies  in  the  fact  that  it 
converts  the  encoding  problem  to  a  search  problem  through 
a  trellis  and  then  permits  one  to  use  some  sequential  search 
algorithms  to  implement  it.  Simulation  results  with  the  k th 
order  arithmetic  codeword  length  function  as  a  LCLF  and  the 
M-algorithm  as  a  sequential  search  algorithm  show  that  this 
fixed  slope  universal  algorithm,  combined  with  suitable  search 
algorithms,  might  be  implementable  in  practice. 
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Abstract  —  A  practical  suboptimal  (variable  source 
coding)  algorithm  for  lossy  data  compression  is  pre¬ 
sented.  This  scheme  is  based  on  an  approximate 
string  matching,  and  it  extends  lossless  Wyner-Ziv 
data  compression  scheme. 


I.  Introduction  and  Main  Results 

We  consider  a  stationary  and  ergodic  sequence  {X*}£L_ ^ 
taking  values  in  a  binary  alphabet  £  =  {0,1}.  We  write 
to  denote  XmXm+  i  . . .  Xn-  As  a  measure  of  fidelity  we  con¬ 
sider  the  Hamming  distance  (however,  other  fidelity  criteria 
can  be  easily  accommodated  into  our  main  results)  defined 
as  dn{x™}Xi)  =  where  di(x,z)  =  0  for 

x  =  x  and  1  otherwise  (x,z  6  E).  We  assume  that  the  max¬ 
imum  allowed  distortion  is  D ,  and  by  R(D)  we  denote  the 
rate-distortion  (cf.  [1]). 

We  propose  a  practical  suboptimal  lossy  data  compression 
scheme  that  extends  the  Lempel-Ziv  scheme.  Our  scheme  re¬ 
duces  to  the  following  approximate  pattern  matching  prob¬ 
lem:  Let  the  “training  sequence”  or  “database  sequence”  z” 
be  given.  Find  the  longest  Ln  such  that  there  exists  1  <  *o  <  n 
in  the  database  satisfying  d(x  •°-1  +  Ln  ,  71 )  <  D-  This  nat¬ 

urally  extends  Wyner  and  Ziv  [5]  (cf.  also  [4])  idea  to  lossy 
situation  (cf.  also  [3])  which  is  subject  of  this  work. 

Actually,  the  real  engine  behind  this  study  (and  its  algo¬ 
rithmic  issues)  is  a  probabilistic  analysis  of  an  approximate 
pattern  matching  problem  which  we  discuss  next.  Our  prob¬ 
abilistic  results  are  confined  to  the  stationary  mixing  model 
in  which  two  random  events  defined  on  two  cr-algebra  sepa¬ 
rated  by  g  symbols  behave  like  independent  events  as  g  — ►  oo. 
We  denote  by  Q'(fif)  the  mixing  coefficient ,  and  assume  that 
Of(<7)  0  as  g  — ►  oo. 

It  turns  out  that  behavior  of  Ln  is  related  to  two  other 
quantities,  namely  the  shortest  path  sn  and  the  height  Hn 
defined  in  the  sequel.  The  height  Hn  is  the  length  of  the 
longest  substring  in  the  database  X”  for  which  there  exists 
another  substring  in  the  database  within  distance  D.  More 
precisely:  the  height  is  equal  to  the  largest  N  for  which  there 
exist  1  <  i,j  <  n  such  that  d(X\ “1  +  iV,  Xj~1+N)  <  D.  Let 
now  Wk  be  the  set  of  words  of  length  k,  and  Wk  €  Wk-  Then, 
the  shortest  path  sn  is  the  longest  k  such  that  for  every  Wk  € 
Wk  there  exists  1  <  i  <  n  such  that  d(Xli~1^k ,  Wk)  <  D. 

The  asymptotic  behaviors  of  Hn  and  sn  depend  on 
generalized  Renyi  entropies  r *,(£))  that  we  define  below.  We 
write  Bo(wk)  for  a  ball  of  radius  D  of  sequences  from  Wk , 
that  is,  BD(vvk)  =  {zf  :  d{x\,Wk)  <  D}. 
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Definition:  For  any  — oo  <  b  <  oo 

-log  EPb(BD{X?)) 

rb^  =  tL - 6 k - 

where  for  b  =  0  we  understand  ro(D)  =  lim&^o  rb(D),  that  is, 

-ElogP(BD(Xi)) 
ro{D )  =  lim  - 7 -  - 

K  —  OO  K 


provided  the  above  limits  exist. 

Using  the  subadditive  ergodic  theorem,  we  can  prove  that 
the  entropies  Tb(D)  exist  in  a  stationary  mixing  model.  The 
main  result  of  the  paper  is  summarized  below. 

Theorem.  In  a  mixing  model  with  the  mixing  coefficient  tend¬ 
ing  to  zero  the  following  holds : 


lim 

n  — oo  log  n 


l 

MD) 


(pr-) 


But,  Ln  does  not  converge  almost  surely  to  any  limit  and 
actually  the  following  is  true 


lim  inf  - - — 

n— *  oo  log  Tt 


1 

r-oo(D) 


lim  sup 

n—+  oo 


log  n  ri(R) 


(a.s.) 


for  the  Markovian  model.  In  the  Bernoulli  model,  the  last 
inequality  can  be  replaced  by  equality. 

In  a  related  paper  Steinberg  and  Gutman  [3]  analyzed 
the  so  called  waiting  time,  defined  as  the  number  Ni  such 
that  the  beginning  substring  of  length  l  reoccurs  approx¬ 
imately  in  the  string  for  the  first  time  after  Ni  symbols. 
The  authors  of  [3]  proved  that  for  a  stationary  ergodic  se¬ 
quence  lim  sup^oQ  log  Ni/l  <  R(D/ 2)  (pr.).  As  a  corol¬ 
lary  to  our  main  result  we  show  that  in  the  mixing  model 
lim^oo  log  Ni/l  =  r0(D)  (a.s.),  which  ultimately  settles  the 
problem  of  [3]. 
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The  Gold- Washing  Algorithm(II):  Optimality  for  </>Mixing  Sources* 
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Abstract  —  Two  versions  of  the  Gold- Washing  data 
compression  algorithm,  one  with  codebook  innovation 
interval  and  the  other  with  finitely  many  codebook 
innovations,  are  considered.  The  corresponding  op¬ 
timality  results  are  proved  for  stationary,  ^-mixing 
sources. 

I.  Description  of  Algorithms 

In  their  recent  paper  [1],  Zhang  and  Wei  proposed  a 
universal  lossy  data  compression  algorithm  called  Gold- 
Washing(GW)  algorithm.  Let  A  and  A  be  our  source  alphabet 
and  reproduction  alphabet,  respectively.  Fix  R  >  0  and  let 
L  =  [2ni?J.  For  each  n  >  1,  the  GW  algorithm  acts  like 
an  adaptive  vector  quantizer  when  it  is  applied  to  encode  a 
source  sequence  x  =  from  A.  It  first  parses  the  source 

sequence  x  =  hito  non-overlapping  source  words  of 

length  n  xn(t)  =  (x(t-i)n+i,®(t_1)n+2i**-.®en)s  t  =  1,2,-  -, 
and  then  uses  a  codebook  Cn  ( t  —  1)  which  changes  slowly  in 
time  to  quantize  xn(/).  Each  codebook  Cn(t  —  1)  consists  of  an 
ordered  list  of  2 L  entries.  Each  entry  in  the  first  half(denoted 
by  C'n(t  —  1))  of  Cn(t  —  1)  is  merely  an  n-length  reproduction 
sequence  called  a  codeword  from  A ,  whereas  each  entry  in  the 
second  half  of  C»(t  —  1)  consists  of  a  codeword  from  A  and 
a  counter  associated  with  the  codeword.  When  the  codebook 
Cn(t  —  1)  is  used  to  quantize  the  source  word  xn(/),  the  encoder 
maps  xn(f)  to  a  smallest  index  for  which  the  corresponding 
codeword  yields  the  smallest  distortion  among  Cn(t  —  1).  Af¬ 
ter  xn(t)  is  encoded,  the  codebook  Cn(t —  1)  is  innovated  and 
changed  to  Cn(f).  The  innovation  operation  of  Cn(t  —  1)  is 
as  follows.  (Assume  an  index  i  is  assigned  to  zn(f)  by  the 
encoder.) 

51  If  i  >  L,  the  counter  associated  with  the  t-th  codeword 
in  Cn  (t  —  1)  is  incremented  by  1. 

52  If  the  counter  associated  with  the  (L  +  1 )- th  codeword 
in  Cn(t  —  1)  is  >  prior  to  the  execution  of  Si,  then  a 
randomly  selected  codeword  from  C\ ( t  —  1)  is  discarded 
and,  at  the  same  time,  the  (L  +  l)-th  codeword  in  Cn(t  — 
1)  is  promoted  into  0^(1  —  1);  otherwise,  the  (L  1 )- th 
entry  in  Cn(t  —  1),  including  the  codeword  and  counter, 
is  discarded  and  the  first  L  entries  in  Cn(t  —  1)  remain 
unchanged. 

53  Each  entry  from  the  (L  -f  2)-th  position  to  the  2L-th 
position  is  moved  one  step  forward. 

54  Finally,  a  new  randomly  selected  codeword  according  to 
a  prescribed  distribution  occupies  the  2L-th  vacant  po¬ 
sition  and  its  counter  is  set  to  0;  the  resulting  codebook 
is  denoted  by  Cn(t)  and  used  to  quantize  xn(l-\-  1). 

In  S2,  is  a  threshold  and  (  is  an  number  >  2.  Initially, 
the  codebook  Cn(0)  is  selected  arbitrarily  and  all  counters  in 
the  second  half  of  Cn(0)  are  set  to  zero.  Knowing  the  initial 
codebook,  the  new  random  codeword  in  S4,  and  the  discarded 
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codeword  in  S2  when  promotion  occurs,  the  decoder  performs 
the  codebook  innovation  operation  in  the  exact  same  way  as 
the  encoder  does. 

It  was  proved  in  [l]  that  the  above  mentioned  GW  algo¬ 
rithm  is  optimal  for  memoryless  sources.  In  this  paper,  our 
aim  is  to  investigate  the  asymptotic  optimality  of  the  GW 
algorithm  for  stationary,  0-mixing  sources.  Accordingly,  we 
shall  consider  the  following  two  versions  of  the  GW  algorithm. 

GW  algorithm  with  codebook  innovation  interval  k:  This 
version  of  the  GW  algorithm  is  similar  to  the  original  one 
mentioned  above  except  that  this  time  the  encoder  inno¬ 
vates  its  codebook  only  when  t  =  (k  +  l)m,  m  =  1,2,---. 
In  other  words,  the  time  interval  between  two  consecutive 
codebook  innovations  is  k ;  during  the  time  period  from  t  = 
(fc-f  l)(m  —  1)  -F  1  to  t  =  (k  +  l)m,  the  codebook  is  held  fixed 
and  used  to  quantize  source  words  xn((k  +  l)(m  —  1)4-1),  . . . , 
xn((k  l)m);  only  after  the  source  word  4-  1  )in)  is  en¬ 

coded,  the  codebook  is  innovated  according  to  the  codebook 
innovation  operation  (S1-S4)  and  then  is  held  fixed(including 
the  counters  in  the  second  part  of  the  codebook)  and  reused 
for  the  next  time  period  of  length  k. 

GW  algorithm  with  finitely  many  codebook  innovations: 
This  version  is  a  variant  of  the  GW  algorithm  with  codebook 
innovation  interval  k  where  after  finitely  many  codebook  in¬ 
novations,  the  codebook  is  held  fixed  and  reused  to  quantize 
the  incoming  successive  source  words. 

II.  Optimality  Results 

Let  p  :  A  X  A  — ►  [0,  +oo)  be  a  single-letter  distortion  mea¬ 
sure.  Given  a  stationary,  ergodic  source  p  with  random  out¬ 
put  { JYi } f° ,  let  D(R)  denote  its  distortion  rate  function  with 
respect  to  the  fidelity  criterion  {pn},  where  pn(xn,yn)  = 
n”1  5Zr=i  p(xn2/0  f°r  £n  €  An  and  yn  6  An.  If  a  stationary, 
ergodic  source  p  with  random  output  X  =  {AL}^  is  encoded 
by  the  GW  algorithm  with  codebook  innovation  interval  k(n), 
the  expected  distortion  per  symbol  is  defined  by 

T 

p{n,n)  =  limsup  2  V'  Epn(Xn(t),Cn(t  -  1))  .  (1) 

T  — oo  i  z — ' 

L-  1 

where  pn(Xn(t),  Cn(t  —  1))  is  the  minimum  of  pn(Xn(t),yn) 
over  all  yn  €  Cn(t  —  1)  and  “ E ”  denotes  the  expectation  with 
respect  to  Xn(t)  and  Cn(t  —  1).  The  following  is  our  optimality 
result  concerning  the  GW  algorithm  with  codebook  innovation 
interval  k(n). 

Theorem  1  Let  p  be  a  stationary,  <j>-mixing  source  having 
the  blowing-up  property  and  whose  <f>-mixing  coefficients  satisfy 
(j>(k(n)n)L~l  —  0  as  n  —  oo,  then 

p(n,p)  — *  D(R)  as  n  —*  oo. 

When  p  is  a  strong  mixing  Markov(or  finite-state)  source, 
Theorem  1  can  be  strengthened  to  almost  sure  convergence. 
Similar  results  hold  for  the  GW  algorithm  with  finitely  many 
codebook  innovations. 
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Abstract  —  Output  probability  distributions  of  the 
test  channels  play  important  roles  in  data  compression 
of  discrete  memoryless  sources  with  fidelity  criterion. 
In  this  paper  a  universal  algorithm  for  estimating  the 
output  probability  distributions  is  proposed.  Sample 
size  required  by  the  algorithm  is  evaluated  under  a 
criterion  of  estimation  similar  to  that  of  PAC  learning 
in  the  computational  learning  theory. 

I.  Introduction 

Rate-distortion  function  describes  a  basic  lower  bound  of  com¬ 
pression  efficiency  asymptotically  attainable  by  a  data  com¬ 
pression  scheme  with  fidelity  criterion.  For  a  discrete  memory¬ 
less  source  of  finite  alphabet  A  =  {ai ,  a2, . . . ,  aj}  it  is  defined 
as  a  minimum  of  the  mutual  information  as  follows: 

R(p,D)=  min  I(p;W),  (1) 

wew(p,D) 

where  p  =  (p(ai ),  p(a2),  -  •  •  }p(aj))  denotes  a  probability  dis¬ 
tribution  of  the  source,  W(p,  D)  is  the  set  of  J  x  J  stochas¬ 
tic  matrices  each  element  of  which  causes  average  distortion 
per  symbol  within  D  under  a  single-letter  fidelity  criterion 
d  :  A  x  A  — ►  [0,  oo)  satisfying  d(a3 ,  a*)  =  0  if  and  only  if  j  =  k. 
The  rate-distortion  function  is  positive  for  all  D  £  [0,  Dmax ), 
where  Dmax  =f  min*  p(aj)d(aj ,  a^).  Fix  A  £  (0,  Dmax) 
arbitrarily  and  denote  by  W*  the  test  channel  matrix  achiev¬ 
ing  the  minimum  in  (1).  The  probability  distribution  on  A 
defined  by  p*(a,k)  =  v(aj  )W*(ak  |aj),  k  =  1,2,. ..,J 

means  the  output  probability  distribution  of  the  test  chan¬ 
nel  corresponding  to  the  distortion  level  A.  In  this  note  a 
universal  estimation  algorithm  of  p*  is  proposed  and  sample 
size  required  by  the  algorithm  is  evaluated. 

Suppose  that  another  discrete  memoryless  source  with  the 
same  alphabet  A  as  well  as  the  source  to  be  compressed 
is  available  to  the  estimation  algorithm.  Denote  by  q  = 
(g(ai),g(a2), .. .  ,g(aj))  the  probability  distribution  of  an¬ 
other  source  called  auxiliary  source .  Assume  that  q(aj )  >  0 
for  all  a3  £  A  satisfying  p*(a3)  >  0.  For  an  arbitrarily  fixed  n 
let  X  —  {xi,x2,...,Xl}  be  L  n-tuples  drawn  independently 
from  the  source  and  y  =  {yl5  y2, . . .  ,yM)  be  M  n-tuples 
drawn  from  the  auxiliary  source.  By  using  the  two  sets  X  and 
y,  the  algorithm  outputs  p*  as  an  estimate  of  p*  satisfying 

Prob(D( p*||p*)  >e)  <6  (2) 

for  any  given  e  >  0  and  6  £  (0, 1)  if  n  is  sufficiently  large, 
where  Prob  means  the  probability  with  respect  to  X  xy.  The 
criterion  of  estimation  (2)  is  deeply  related  to  a  data  compres¬ 
sion  scheme  with  fixed  data-base  proposed  by  Steinberg  and 
Gutman  [l]  and  analyzed  in  detail  by  Koga  and  Arimoto  [2]. 


Moreover,  imposing  the  criterion  (2)  is  the  first  attempt  to 
introduce  a  viewpoint  of  the  PAC  (Probably  Approximately 
Correct)  learning  models  proposed  by  Valiant  [3]  to  data  com¬ 
pression  with  fidelity  criterion. 


II.  Main  Results 

It  is  assumed  that  the  estimation  algorithm  can  use  an  esti¬ 
mate  of  p,  denoted  by  pe,  satisfying  ||p  -  pe||i  =  0(n~^c)  for 
any  fixed  fte  £  (0,  |].  It  estimates  p*  in  the  following  manner: 
Algorithm  1  1)  Choose  a  >  0  and  ft  £  (0,/?e)  arbitrar¬ 

ily.  Derive  X  =  {xi ,  x2, . . . ,  Xz,}  from  the  source  and 
y  =  {yq }  y2  j . . . ,  yM}  from  the  auxiliary  source.  Fix  an 
integer  mo  arbitrarily  satisfying  1  <  mo  <  M . 

2)  For  all  m  =  1,  2, . . . ,  M  define  Af( ym,  A)  by 
A^(ym,  A)  =  {x  £  X  |  d„(x,  y)  <  A 

and  ||pe  -  t(x) || i  <  n-'3},  (3) 


where  dn  denotes  distortion  between  n-tuples  defined  by 
d,  and  t(x)  denotes  the  type  ofx.  Search  for  the  integer 
m*  maximizing  |Af(ym,A)|. 

3)  If\Af(ym.,A)\  >  na ,  output  t(ym.).  Otherwise ,  output 
° 

Under  the  assumption  that  p*  is  unique,  lower  bounds  of  L 
and  M  guaranteeing  Algorithm  1  to  meet  the  criterion  (2)  are 
established  in  the  following  theorem. 

Theorem  1  Let  Rx  =  ~  log2  L  and  Ry  =  ^  log2  M .  Then 
for  any  fixed  A  £  (0,  Dmax),  if  the  two  inequalities 


min  min 

q':£>(p*||q')<«  vev(p,q',A) 


/(q;;  V)  <  Px  <  R(p>  A),  (4) 


Py  >  ^(qJlq) 


(5) 


are  satisfied  then  there  exists  an  integer  no  satisfying  that 
Algorithm  1  outputs  p*  meeting  the  criterion  (2)  for  all 
n  >  no,  where  I(q';V)  denotes  the  mutual  information , 
V(p,  q^A)  denotes  the  set  of  J  x  J  stochastic  matrices  satis- 
tying  J2k=i  ?'(aOUaj|a*)  =  p(a j)  for  all  j  =  1, 2, . . . ,  J  and 
E>=i  i  g'(a*)UajK)‘i(aj>a*)  <  A,  and  q£  is  a  proba- 
bility  distribution  on  A  that  achieves  the  minimum  in  (4)  with 
a  stochastic  matrix  V  £  V(p,q£,  A).  □ 
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Abstract  —  A  data-base  for  data  compression  is  uni¬ 
versal  if  in  its  construction  no  prior  knowledge  of  the 
source  distribution  is  assumed  and  is  optimal  if,  when 
we  encode  the  reference  index  of  the  data-base,  its  en¬ 
coding  rate  achieves  the  optimal  encoding  rate  for  any 
given  source:  in  the  noiseless  case  the  entropy  rate 
and  in  the  semifaithful  case  the  rate-distortion  func¬ 
tion  of  the  source.  We  construct  a  universal  data-base 
for  all  stationary  ergodic  sources,  and  prove  the  opti¬ 
mality  of  the  thus  constructed  data-base  for  a  block- 
shift  type  reference  and  a  single-shift  type  referrence. 

I.  Introduction 

We  consider  the  case  where  both  a  sender  and  a  receiver  have 
the  same  data-base  on  their  respective  sides.  In  this  case,  we 
can  transmit  a  source  output  in  the  following  way:  the  sender 
refers  to  the  data-base  for  the  data  string  which  matches  the 
given  source  output  and  then  encodes  the  reference  index  of 
the  data  string  to  send  it  out.  The  receiver  then  decodes  the 
encoded  index  to  retrieve  the  data  string  from  the  data-base 
and  then  uses  it  to  represent  the  source  output.  There  are  two 
typical  conceivable  methods  of  referring  to  the  data-base:  one 
is  a  block-shift  type  reference  and  the  other  is  a  single-shift 
type  reference.  Either  method  can  achieve  data  compression 
if  the  number  of  bits  needed  to  encode  the  reference  index  rel¬ 
ative  to  the  data-base  is  smaller  than  that  needed  to  represent 
the  source  output  itself.  Hereafter  we  refer  to  the  number  of 
bits  divided  by  the  sequence  length  of  the  source  output  as 
the  encoding  rate. 

We  construct  an  optimal  universal  data-base  for  ergodic 
sources.  The  construction  of  our  data-base  sequence  relies 
entirely  on  the  basic  concept  of  the  complexity  function  (cf. 

[1]):  it  is  constructed  by  ordering  data  strings  according  to  the 
increasing  complexity.  The  obtained  data-base  sequence  can 
be  applied  for  both  the  block-shift  type  and  the  single-shift 
type  reference  cases. 

It  should  be  noted  that  this  data-base  can  be  proved  opti¬ 
mal  also  for  the  fixed-rate  universal  code  with  distortion  (cf. 

[3])- 

II.  Block- Shift  Type  Reference  Case 

Let  A  be  a  finite  set  and  let  L  be  a  complexity  function  m  the 

almost  sure  sense  which  is  defined  in  [1]. 

Definition  1  Let  elements  of  set  An  be  ordered  according  to 
the  increasing  complexity  ( ties  may  be  broken  in .  an  arbitrary 
order).  The  mapping  which  maps  an  element  of  An  into  its  or¬ 
der  is  called  an  index  function  induced  by  L  and  is  denoted  by 
Cl,u-  A  data-base  sequence  corresponding  to  the  index  func¬ 
tion  CL,n  is  defined  by 

Un{An  1  =  £Z,n(l)  *  £Z,n(2)  *  *  ■  *  *  ^n(l^n|)> 

where  notation  *  is  used  to  denote  concatenation  of  strings. 

Next,  let  A  be  a  standard  space  and  let  p  is  a  distortion 
function  which  satisfies  some  conditions  stated  in  [1] . 


Definition  2  A  D- semifaithful  index  function  £l,d,u  is  de¬ 
fined  by 


CL,D,n{xn)  =  H}*11  £l,ti(£  ) 

xneA^{xn) 

=  min{Z;  Un[i- i)+i  £  A.r)(xn)},  x  £  A  , 


where  u\  =  (&*, .  •  • ,  %)  and 

Al{xn)  =  jxn  G  An;  <  £>J  • 

Theorem  1  For  any  A-valued  stationary  ergodic  source  X, 


lim  —  log2  CL,n(xn)  —  Hx  Px~°"s’i 

n—*oo  Tl 

and  for  any  A-valued  stationary  ergodic  source  X  and  D  > 
Do, 

lim  —  log2  £L,D,n(xn)  =  Rx(D)j  px-a-s., 

n—*oc  n 

where  II  x  and  Rx(D)  is  the  entropy-rate  of  the  source  X  and 
the  rate-distortion  fucntion  of  the  source  X,  respectively. 


III.  Single-Shift  Type  Reference  Case 

We  now  consider  the  case  when  a  data-base  sequence  is  ret- 

fered  to  by  the  single-shift  type  reference. 

Definition  3  We  define  a  function  SL,n  be  ginen  by 

Sl, n(xn)  =  min {s;  xn  =  n_1},  i"  €  A" 

and  refer  to  it  as  the  index  function  for  the  single-shift  type 
reference.  And  we  define  a  finction  Sl,d,u  he  given  by 

SL,D,u{xn)  =  min  Si„«(£n) 
xneA^(xn) 

=  min{s;  ul+n  1  £  AS(^n)}5  x  £  A 

and  refer  to  it  as  the  D -semi faithful  index  function  for  the 
single-shift  type  reference . 

Theorem  2  For  any  A-valued  stationary  ergodic  source  X , 
lim  -  log2  Sl, n(xn)  =  p*  -<*•»•» 

Tl— »  OO  71 

and  for  any  A-valued  stationary  ergodic  source  X  and  D  > 
Do, 

lim  —  log2  SL,D,n{xn)  —  Rx{D),  px-a.s. 

n—>  oo  n 
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Two  practical  universal  source  coding  schemes  based  on 
approximate  string  matching  are  proposed.  One  is  an 
approximate  fixed-length  string  matching  data  compres¬ 
sion,  and  the  other  is  an  LZ-type  quasi  parsing  method 
by  approximate  string  matching.  It  is  shown  that  in  the 
former  algorithm  the  compression  rate  converges  to  the 
theoretical  bound  of  R(D)  for  ergodic  and  stationary 
processes  as  the  average  string  length  tends  to  infinity. 
A  similar  result  holds  for  the  latter  algorithm  in  the 
limit  of  the  infinite  database  produced  by  the  former  al¬ 
gorithm.  The  main  advantages  of  the  proposed  methods 
are  the  asymptotic  behavior  of  the  encoder  implementa¬ 
tion  and  the  simplicity  of  the  decoder.  Practical  results 
of  image  and  voice  compression  will  be  presented. 

Definition  1.  We  look  a t  the  positive  time  at  the  se¬ 
quence  i«o,  «!•••■  Let  L  be  the  first  index  such  that 
the  string  u o  . .  .  ul- i  is  not  a  substring  of  the  data-base 
uZn-  That  L  is  equal  to  Ln(u). 

Definition  2.  The  random  variable  Ni(u)  for  l  >  0  is 
the  smallest  integer  N  >  l  such  that  u g”1  =  ul~^~N . 

Given  alphabets  U  and  V,  a  distortion  measure  is  any 
function  d  :  \U  x  V\  — >  7Z+ .  Let  pi(u\v)  denote 
the  distortion  for  a  block-  the  average  of  the  per  let¬ 
ter  distortions  for  the  letters  that  comprise  the  block, 
pi(u;v)  =  d(uk;vk). 

Definition  3.  For  each  sample  sequence  u  of  length  l, 
taken  from  the  sequence  u ,  we  define  a  set  D  —  Ball , 

D  —  Ball{u)  =  v)  < 

Definition  4.  For  each  sample  sequence  u  we  define  the 
random  variable  DLn(u,  vZn)  =  _  max  Ln(v,v  ;*). 

v  :p(u,v)<D 

Definition  5.  For  each  sample  sequence  u  we  define  the 

random  variable  DNi(u,vZl)  =  min  Ni(v,vZl). 

v  :p(u,v)<D 


Data  Compression  Scheme  A. 

T  Verify  the  readiness  of  the  decoder. 

2.  Take  a  string  u  —  Uq_1  of  length  L 

3.  If  can  be  approximately  matched  up  to  tol¬ 
erance  D  by  a  substring  of  vZn,  encode  it  by  specify - 
ing  DN\(u ,  vZn)  to  the  decoder.  Add  a  bit  as  a  header 


flag  to  indicate  that  there  is  a  match.  Append  string 
yl-DNfNl  database  in  decoder  and  encoder  at  posi¬ 
tion  0. 

4.  If  not ,  indicate  that  there  is  no  match  and  trans¬ 
mit  to  the  decoder  and  append  to  the  database  in  the 
encoder  and  decoder ,  the  string  Uq-1,  which  is  the  best 
D-Ball  center,  obtained  by  blockcoding  on  the  current 
t/q  1  string  and  is  based  on  the  accumulated  empirical 
distribution  in  the  past  of  u.  Blockcoding  algorithms 
are  known  in  literature.  The  codeword  is  transmitted 
as  is,  without  compression. 

5.  Shift  the  indices  by  l  to  the  appropriate  values.  Up¬ 
date  n  to  n  +  /.  Repeat  the  process  from  step  1 ,  with  a 
new  string  of  length  l  and  a  database  vZ\- 


Limit  Theorem  A.  Given  is  a  D-semi faithful 
database  vZIq  generated  by  Scheme  A  from  a  station¬ 
ary  ergodic  process  u.  We  assume  that  the  system 
preserves  ergodicity  and  stationarity.  For  all  (3  >  0, 

I  log  £>*,(«,  tcij)  'i 


lim  Pr 


/—►CO 


/ 


>p\ 


0.  The  av¬ 


erage  compression  ratio  attains  the  bound  R(D). 


Scheme  B. 

1.  1=1. 

2.  Take  the  string  of  length  l  Uq-1. 

3.  If  Uq-1  can  be  approximately  matched  up  to  toler¬ 
ance  D  by  a  substring  of  vZ store  a  pointer  N  to  that 
substring  and  increment  l.  Go  to  step  2. 

4.  If  not,  append  to  the  data  base  track  the  string 
yl-N  N  position  zero  and  further,  and  append  the 
letter  v/_i  -  the  reproducing  letter  which  satisfies 

vi-i)  =  0.  The  encoding  is  done  by  the  pointer 
to  the  string  V-N  N>  the  length  DLn(u)  and  the  last 
reproducing  letter  associated  to  the  last  source  letter. 

5.  Repeat  the  process  from  step  1,  where  the  database 
is  appended  with  the  chosen  string  denoted  by  VqLti  . 


Limit  Theorem  B.  Given  is  a  suffix 


, -l 


taken  from 


an  infinite  database  generated  by  an  encoder  -  decoder 
pair  as  described  in  Scheme  A.  At  time  zero  switch  to 
Scheme  B.  As  the  memory  size  -  n  tends  to  infinity, 
for  the  new  sample  sequence  u  encoded  from  the  sta¬ 
tionary  ergodic  input  u  by  Scheme  B,  in  probability, 

1  ’°8”  l  =  n(D). 


lim 


l  DLn( 


U ,  V 


-1 
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Abstract  —  Upper  bounds  for  certain  exponential 
sums  over  Galois  rings  are  presented.  The  bound 
may  be  regarded  as  the  Galois  ring  analogue  of  the 
so  called  Kloosterman  sums  and  related  exponential 
sums  with  a  Laurent  polynomial  argument.  An  appli¬ 
cation  of  the  bounds  to  the  design  of  large  families  of 
polyphase  sequences  with  good  correlation  properties 
is  also  given. 


Theorem  2  Let  q  =  8  and  aiyfti  €  GR(&,Tn)  and  <23, /?3  € 
4 GR(8,  m)  be  such  that  at  least  one  of  them  differs  from  zero. 
Then 


£  *(«,.  +  £  +  «»*•  +  §) 


<  8\/2™. 


As  is  the  case  with  the  usual  Kloosterman  sums,  we  have 
the  additional  result  that  the  associated  hybrid  sums 


I.  Introduction 

Let  il>  :  GR(q,  rn)  -+  C*  be  an  additive  character  of  the  char¬ 
acteristic  q  =  pe,  (p  prime)  Galois  ring  of  q m  elements.  Let  T 
denote  the  subset  of  GR(q ,  m)  consisting  of  the  zero  element 
and  the  powers  of  an  element  a  of  multiplicative  order  pm  -  1. 

In  [1]  Kumar,  Ilelleseth  and  Calderbank  studied  the  expo¬ 
nential  sums  of  the  type 

*er 

where  f(x)  is  a  polynomial  with  coefficients  in  the  ring 
GIl(qym).  They  apply  the  theory  of  the  function  fields  of 
algebraic  curves  and  their  characters.  The  same  technique 
will  allow  us  to  study  such  sums,  where  in  place  of  the  poly¬ 
nomial  f(x)  we  have  a  Laurent  polynomial,  i.e.  we  allow  neg¬ 
ative  powers  of  x  as  well.  Observe  that  this  makes  sense  in 
T*  =  T  \  {0}  as  all  the  elements  in  T*  are  units  of  the  ring. 
Our  technique  diilers  from  the  approach  in  [l]  in  the  sense 
that  we  have  utilized  a  Witt  vector  presentation  of  the  Galois 
rings:  For  example  we  view  the  ring  GR(4,m)  as  the  ring  of 
Witt  vectors  W2(F)  of  length  two  over  the  field  F  ~  GF( 2,  m). 
The  elements  of  W2( F)  are  ordered  pairs  (<*0,  cvi ),  on  €  F  and 
the  ring  operations  of  two  such  pairs  are  defined  as 

( Q'O  ,  O'  1  )  +  (/?0,/?l)  =  (<*0  +  +  «o/?o), 

(a0,  q q)  *  (/?0,/?i)  =  (ao/?o,  oq/?o  + /?iao)> 

where  the  arithmetical  operations  between  the  individual  com¬ 
ponents  are  the  usual  field  operations.  Our  set  T  consists  then 
of  the  pairs  (/?,()),/?  G  F.  Similarly  the  rings  GR(S,  m)  can 
be  viewed  as  rings  of  Witt  vectors  of  length  three.  For  a  de¬ 
scription  of  the  arithmetic  of  Witt  vectors  of  arbitrary  length 
and  characteristic  we  refer  the  interested  reader  to  Jacobson 
[3,  section  8.10], 

II.  Results 

We  have  proven  the  following  results: 

Theorem  1  Let  q  =  4  and  cv,/?  6  GR(4,  m)  be  arbitrarij  ex¬ 
cluding  the  case  a  =  fi  =  0.  Then 

E  *'(a*+x) 

ier* 


xer* 

have  exactly  the  same  bounds.  Here  f(x)  is  any  of  the  Laurent 
polynomial  appearing  in  the  above  results  and  \  (aP)  = 

is  a  multiplicative  character  of  the  group  T* ,  w  =  e27T,/(p  _1). 
Such  hybrid  sums  can  be  used  either  to  analyze  the  aperiodic 
correlation  properties  of  the  resulting  family  of  sequences  or 
to  get  an  even  larger  family  with  a  very  large  alphabet.  After 
submitting  this  note  I  have  learned  that  Ilelleseth,  Kumar  and 
Shanbhag  have  obtained  more  general  versions  of  the  above 
theorems  [2].  However,  they  didn’t  consider  the  associated 
hybrid  sums. 

III.  Applications  to  Sequence  Design 

The  above  character  sums  appear  naturally  as  correla¬ 
tion  values  of  certain  families  of  sequences.  To  arrive  at 
the  families  all  one  has  to  do  is  to  select  representatives  of 
cyclically  distinct  classes  of  associated  codewords  of  period 
L  =  \T*\  —  2m  —  1.  Our  character  sums  yield  families  with 
the  following  parameters  for  all  m  >  1: 

•  Quaternary  family  of  size  Lz  and  maximal  correlation 
1  +  4  vTT  1, 

•  Eight-phase  family  of  size  L7  and  maximal  correlation 

1  +  &y/L  +  1, 

•  Polyphase  family  of  size  LA  and  maximal  correlation 
1  -f  4 L  +  1  and 

•  Polyphase  family  of  size  L8  and  maximal  correlation 

1  -f-  8\/  L  +  1. 

Here  the  ‘polyphase’  families  have  alphabets  of  sizes  4 L 
and  8 L  respectively  effectively  filling  in  the  unit  circle  of  the 
complex  plane. 
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Abstract-In  this  paper,  GMW  sequences  and  fami¬ 
lies  of  No  sequences  are  generalized.  Generalized  GMW 
sequences  have  ideal  autocorrelation  properties  and  bal¬ 
ance  properties  and  generalized  No  sequences  also  have 
optimal  correlation  properties  in  terms  of  Welch’s  lower 
bound.  The  linear  spans  of  the  generalized  GMW  se¬ 
quences  and  generalized  No  sequences  appear  to  be  large 
although  we  do  not  at  present  have  a  closed-form  ex¬ 
pression  for  the  linear  span.  A  count  of  the  numbers  of 
cyclically  distinct  generalized  GMW  sequences  and  gener¬ 
alized  No  sequences  that  can  be  constructed  is  provided. 


NgGMW  = 


0(2"  -  1) 
n 


n 


0( 2m»  -  1) 
nii 


where  <^(-)  is  Euler’s  phi  function. 


(4) 


III.  GENERALIZATION  OF  NO  SEQUENCES 

The  definition  of  a  generalized  No  sequence  family  is 
given  as  follows: 

Definition  2  :  Let  n  and  m,-,  i  =  1,2,  ...,d,  be  inte¬ 
gers  satisfying 


I.  INTRODUCTION 

In  this  paper,  the  generalization  of  GMW  sequences 
and  No  sequences  is  introduced.  In  Section  II,  GMW  se¬ 
quences  are  generalized,  those  ideal  full-period  autocorre¬ 
lation  properties  are  derived,  and  a  count  of  the  number  of 
cyclically  distinct  generalized  GMW  sequences  that  can 
be  constructed  is  provided.  It  is  also  shown  how  the  fam¬ 
ilies  of  No  sequences  can  be  generalized  in  an  identical 
fashion  and  optimal  correlation  properties  are  described 
in  Section  III.  Here,  the  number  of  distinct  families  of  gen¬ 
eralized  No  sequences  of  given  period  is  described,  too. 


n  =  2  md  and  for  1  <  i  <  d  -  1.  (5) 

A  family  of  generalized  No  sequences 

Sg  =  {Si(t)  I  0  <  t  <  N  -  1,  1  <  i  <  2m}  (6) 

is  a  set  of  multiple  trace  function  sequences  defined  as 

Si(t)  =  tr™' ■  -{[<02*)+ 7l‘«TT'}  ■  •  }]r2}]ri}, 

(7) 

where  TV  =  2n  —  1,  Tf  is  in  GF(2m •*),  T  -  2mj  +  1,  and 
for  l  <  i  <  d, 


II.  GENERALIZATION  OF  GMW 
SEQUENCES 

We  can  define  generalized  GMW  sequences  as  follows: 
Definition  1  :  Let  n  and  mi ,  i  =  1,2 be  inte¬ 
gers  satisfying 

md\n  and  for  l<i<d-l.  (1) 

A  generalized  GMW  sequence  is  then  defined  as  the  mul¬ 
tiple  trace  function  sequence  of  period  N  given  by 

sg{t)  =  K'{[tr™l{[tv™l{. .  .{[*r£>‘)P}  ■  • 

(2) 

where  a  is  an  element  of  order  N  =  2"  —  1  and  for  1  < 
i  <  d, 

gcd(ri,  2m‘  -  1)  =  1,  1  <  r;  <  2m>  -  1.  (3) 

The  generalized  GMW  sequence  has  the  ideal  full- 
period  autocorrelation  values  and  it  can  be  counted  as 
follows: 

Theorem  1  :  The  number  of  cyclically  different  gen¬ 
eralized  GMW  sequences  of  given  period  N  is  given  by: 


gcd(rit  2mi  -  1)  =  1,  1  <  r,  <  2m’  -  1.  (8) 

The  full-period  correlation  function  of  No  sequences 
are  the  same  as  that  of  Kasami  sequences.  Counts  for  the 
number  of  cyclically  distinct  generalized  GMW  sequences 
and  generalized  No  sequence  families  are  the  same. 

REFERENCES 

1.  J.  S.  No,  ”A  new  family  of  binary  pseudorandom 
sequences  having  optimal  periodic  correlation  prop¬ 
erties  and  large  linear  span,”  Pli.D.  dissertation, 
University  of  Southern  California,  Los  Angeles,  CA, 
USA,  May,  1988. 

2.  J.  S.  No  and  P.  V.  Kumar,  ”A  new  family  of  binary 
pseudorandom  sequences  having  optimal  periodic 
correlation  properties  and  large  linear  span,”  IEEE 
Trans.  Inform.  Theory,  vol  IT-35,  no.  2,  pp.  371- 
379,  Mar.  1989. 

3.  A.  Klapper,  ”d-form  sequences:  Families  of  se¬ 
quences  with  low  correlation  values  and  large  lin¬ 
ear  spans,”  IEEE  Trans.  Inform.  Theory ,  vol  41 
pp.  423-431,  Mar.  1995. 


86 


Codes  for  Optical  Transmission  at  Different  Rates 

0.  Moreno1and  Svetislav  V.  Marie 

Department  of  Mathematics  and  Computer  Science,  University  of  Puerto  Rico,  Rio  Piedras,  00931. 
Department  of  Electrical  Engineering,  City  College  of  the  City  University  of  New  York,  New  York,  NY  10031. 


Abstract  —  Constructions  for  families  of  cyclic  con¬ 
stant  weight  codes  are  presented  to  be  used  in  fiber 
optic  CDMA  networks  for  multirate  transmission.  It 
is  shown  that  the  discussed  code  families  satisfy  the 
requirements  for  successful  transmission  of  different 
data  rates  using  the  CDMA  technique. 

I.  Introduction 

An  (n,  A)-optical  orthogonal  code  (OOC)  (see  [1],  [2], [3])  C} 
n>l,l<w<n,  1  <  A  <  u/,  is  a  family  of  {0,  l}-sequences 
of  length  n  and  Hamming  weight  u  satisfying  the  following 
auto  and  cross-correlation  conditions: 

n  — 1 

0n  t)  <  A  (1) 

k~0 

for  all  sequences  x(.)  E  C  and  all  integers  r  ^  0  (modn)  and 

71  —  1 

x(k)y(k  ©n  r)  <  A  (2) 

k= o 

for  all  pairs  of  sequences  ®(.),  y(.)  E  C  and  all  integers  r, 
where  ©n  denotes  addition  modulo  n. 

For  a  given  set  of  values  of  n,  u  and  A,  let  <£(n,  w,  A),  denote 
the  largest  possible  cardinality  of  an  (n,  u>,  A)-optical  orthogo¬ 
nal  code.  Upper  bounds  for  this  function  and  several  optimal 
constructions  for  A  =  1  and  2  can  be  found  in  [l]-[3].  An  easy 
upper  bound  derived  from  the  Johnson  bound  (see  [1])  states 
that 

A)  <  1  j4(n’2a'-2A'a,)|  <  (»-l)(n-2)...(n-A) 
n  u(ll>  —  l)...(a>  —  A) 

(3) 


properties  (value  of  A)  of  one  CPCW  family  but  also  cross- 
correlation  properties  of  families  of  CPCW  with  different  n 
and  u. 

In  [3],  three  constructions  ( A ,  B  and  C  )  for  families  of 
OOC1  s  are  presented.  In  every  case,  the  families  are  asymp¬ 
totically  optimum  in  the  sense  that,  as  the  length  of  the  se¬ 
quence  family  -»■  oo,  the  ratio  of  the  size  of  the  OOC  to  that 
of  the  maximum  permissible  as  determined  by  the  bound  in 
(3)  above,  approaches  unity. 

All  three  constructions  make  use  of  the  following  two  ideas. 
Let  n  be  an  integer  that  can  be  expressed  as  the  product  n  = 
nin2  of  two  relatively  prime  integers  and  n2.  Then,  from 
an  application  of  the  Chinese  remainder  theorem,  it  follows 
that  the  construction  of  sets  of  {0,  1}  sequences  with  periodic 
correlation  bounded  above  by  A  is  completely  equivalent  to 
the  task  of  constructing  a  collection  of  arrays  whose  doubly- 
periodic  correlation  is  bounded  above  by  A.  Secondly,  the 
codewords  within  each  family  are  required  to  have  constant 
weight.  The  sequences  in  each  of  the  three  families  A,  B  and 
C  when  represented  in  matrix  form  appear  as  the  graph  of  a 
function  mapping  Z n2  Zni .  This  guarantees  that  they  all 
have  constant  weight  (approximately)  n2.  The  functions  in  A 
and  B  are  polynomials,  whereas,  construction  C  uses  rational 
functions. 

In  this  talk,  we  will  show  that  Construction  A  can  be 
used  to  construct  a  nested  chain  of  asymptotically  optimum 
OOC’s  of  lengths  no  =  n,  n,*|no,  i  >  1.  Using  on-off  keying 
as  the  method  of  data  modulation,  we  show  how  this  nested 
chain  can  be  used  to  efficiently  allow  several  users  with  differ¬ 
ent  information  rates  to  simultaneously  transmit  information. 
Decoding  of  the  desired  information  stream  is  easily  accom¬ 
plished  using  correlation  detection. 

Such  codes  are  relevant  to  multimedia  communications. 


II.  Constructions 

Codes  with  these  properties  have  been  called  optical  or¬ 
thogonal  codes  in  papers  [lj— [4]  in  connection  with  applica¬ 
tions  for  optical  channels  and  cyclically  permutable  constant 
weight  codes  (see  [5]  and  references  there)  in  connection  of 
constructing  of  protocol  sequences  for  the  multiuser  collision 
channel  without  feedback. 

In  a  multimedia  environment  different  types  of  users  trans¬ 
mit  at  different  data  rates  [6].  As  a  most  obvious  example 
in  Personal  Communication  Networks  we  have  low  rate-voice 
transmissions  and  high  rate  data- transmissions. 

Note  that  in  a  multirate  case  a  CPCW  with  a  longer  length 
corresponds  to  lower  data  bit  rates  and  the  smaller  length 
CPCW  corresponds  to  higher  data  bit  rates.  Hence  for  mul¬ 
timedia  applications  we  need  CPCW  families  with  different 
lengths  and  weights.  The  code  construction  is  complicated  by 
the  fact  that  now  we  need  to  establish  not  only  correlation 

1  This  research  is  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  Grant  numbers  RII-9014056,  NCR-890505,  and  the 
Computational  Mathematics  Group  of  the  EPSCoR  of  Puerto  Rico 
Grant. 
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Abstract  —  An  upper  bound  for  the  extended 
Kloosterman  sum  over  Galois  rings  is  derived.  This 
bound  is  then  used  to  construct  new,  efficient  se¬ 
quence  families  with  prime-power  phase. 

I.  Introduction 

For  a  fixed  prime  p  and  integers  e,  ra,  e  >  2,  m  >  1,  let 
Re,m  denote  the  Galois  Ring  of  characteristic  pe  and  contain¬ 
ing  pem  elements.  Let  tpe,m  be  a  non-trivial  additive  character 
of  Re,m  and  let  f(x)  be  a  non- degenerate  polynomial  (i.e.  no 
term  in  f(x)  has  degree  which  is  a  multiple  of  p)  over  Re>m 
with  weighted  degree  [1]  Df.  Define  Te,m  =  T*>m  U  0  where 
Te,m  is  a  cyclic  subgroup  (the  Teichmuller  group)  of  order 
pm  —  1  of  In  [1] ,  Kumar  et  al.  prove 

Y  ^,m(/(*))  <  (D,-l)y/p*.  (1) 

This  bound  leads  to  new  sequence  families  which  compare 
very  well  with  existing  sequence  families  when  maximum  non¬ 
trivial  correlation,  alphabet  size  and  family  size  are  used  as  a 
basis  for  comparison.  More  precisely,  let  To  denotes  a  maxi¬ 
mal  family  of  pairwise,  cyclically  distinct  sequences  each  hav¬ 
ing  period  pm  —  1  from  the  set 

=  {{T.fI„(/(/3*))}tez  I  Df  <  D) 

where  j3  is  a  generator  of  T*,m  and  f(x)  £  Re,m[x]  has  weighted 
degree  Df.  We  then  have  the  following  bounds  for  the  max¬ 
imum  non-trivial  correlation  Cmax  and  the  size  of  the  family 
Tp  as  under 

Cmax  <  1  +  (D  -  1  )VP^ 

and 

\FD\  >pm(£,_l^'J_1). 

In  this  paper  we  obtain  an  upper  bound  for  the  extended 
Kloosterman  sums,  i.e.  sums  of  the  form 

K',m(fuf2)  =  Y  + 

xer*jTn 

where  fi(x),  f2{x)  are  polynomials  over  Re,m •  These  sums 
lead  to  new  sequence  designs  for  CDMA  applications. 

II.  Bound  on  the  extended  Kloosterman  sum 

Let  fi(x),f2(x)  £  Re,m[x]  have  weighted  degrees  Df1  and 
D fn  respectively.  Let  xpe,m  be  any  non-trivial  additive  char¬ 
acter  of  Re,m .  Using  L-function  techniques,  we  can  express 
the  exponential  sum  YlxeT*  ^e,m(/i (#)  +  /2(^-1))  as  a  sum 

°The  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  Grant  Number  NCR-93-05017  and  in  part  by  the  Nor¬ 
wegian  Research  Council  for  Science  and  the  Humanities. 


of  Dfx  +  Df2  complex  numbers.  These  complex  numbers  can 
be  shown  to  be  the  reciprocal  roots  of  the  zeta  function  of  a 
function  field  over  Fpm .  It  follows  from  the  Riemann  Hypoth¬ 
esis  for  function  fields,  that  the  magnitude  of  each  of  these 
complex  numbers  is  y/p™.  Thus,  using  notation  as  above,  we 
obtain  the  following  theorem: 

Theorem  1 

Y  *l>e,m(fi(x)  T  MaT1))  <{Dfl+ 

*6Te*m 


III.  Applications  to  sequence  designs 

We  now  restrict  ourselves  to  the  case  when  p  —  2.  Consider 
the  set  Sd1,d2  of  sequences  defined  via 

SDl.Da  =  {{T.,m(/i(/3*)  +  /2(/r'))}  I  Dh  <  Du  Dh  <D2} 

where  f3  is  a  generator  of  T*,m  and  ft(x)  £  Retm[x],  *  =  1,  2  is 
non-degenerate  with  weighted  degree  Df{.  Let  the  set 

Tdi,D-2  c  SDltD2 

consist  of  a  maximal  family  of  pairwise,  cyclically  distinct  se¬ 
quences  in  Sd1id2  with  each  sequence  having  period  2m  —  1. 
Using  Theorem  1,  it  is  easy  to  see  that  the  maximum  non¬ 
trivial  correlation  Cmax  of  the  family  Td1id2  is  upper  bounded 

via  _ 

Cmax  <  1  +  {Di  +  D2)V2m .  (2) 

The  size  of  the  family  Td1>d2  can  be  lower  bounded  using  the 
formula  below: 

\FDi,Di\  >  +  .  (3) 


Note  that 


D\  -J-  D2  +  1 


sl£j  +  l£]+i. 


In  case  of  equality,  we  note  that  the  corresponding  bounds 
for  the  maximum  non-trivial  correlation  and  family  size  of 
Td!  +  d2+i  and  37d1)d2  are  equal. 
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Abstract  —  For  the  application  in  a  cellular  common- 
code  spread-spectrum  multiple  access  system,  here 
referred  to  as  CTDMA,  the  ambiguity  function  of 
the  binary  spreading  sequence  and  its  mismatched  de¬ 
spreading  filter  is  optimized. 

I.  Introduction 

In  cellular  Code  Time  Division  Multiple  Access  (CTDMA) 
systems  [1],  the  user  signals  are  first  separated  by  a  symbol- 
level  TDMA  scheme  and  then  spread  in  a  DS-CDMA  fashion 
by  a  common  (cell-specific)  binary  spreading  sequence  s[-]  of 
length  L  with  «[n]  €  {— 1,  +1}  for  0  <  n  <  L  and  zero  other¬ 
wise.  At  the  receiver,  the  incoming  signal  is  passed  through 
the  aperiodic  inverse  filter  t>[-]  of  $[•],  which  completely  sep¬ 
arates  the  users  of  the  same  cell  and  thus  omits  this  kind 
of  interference  that  usually  appears  in  cellular  CDMA.  The 
filter  u[*],  two-sided  infinite  in  length,  is  well  approximated 
by  a  filter  u/[*]  of  length  N  ~  3L,  which  still  achieves  a  suf¬ 
ficient  user  separation  by  minimizing  the  aperiodic  correla¬ 
tion  sidelobes  Csw[m ]  —  s[nl  W71  +  ml»  m  7^  0-  Differ¬ 
ent  techniques  to  design  w[n]  have  been  discussed  in  the  lit¬ 
erature:  Truncation  of  t>[*]  is  conceptually  simple  [2],  linear 
programming  (LP)  optimizes  the  peak/off-peak  (POP)  ratio 
psw  —  Csw[§]/ maxm^o  |Gsu,[m]|,  and  the  least-square  (LS)  al¬ 
gorithm  minimizes  the  sidelobe  energy  of  Cslu[-]. 

II.  System  Analysis 

These  filter  design  techniques  neglect  possible  Doppler  fre¬ 
quency  shifts  that  occur  in  cellular  applications  due  to  veloc¬ 
ity  differences  At>.  The  corresponding  effect  at  the  receiver 
output  is  described  by  the  ambiguity  function 

Aau,[m,£]  =  ^  ttj[n  +  m] 

n 

with  £  —  Tcfd •  Here,  Tc  is  the  chip  duration,  fd  =  2Avfo/c 
the  Doppler  shift,  fo  the  carrier  frequency  and  3'108^ 
the  speed  of  light.  With  fo  «  2  GHz,  Tc  «  1  ps  and  A vmax  = 
30 . . .  300  maximum  values  of  £ma<r ^5  - 10 “4  ...  5*  10 ”3  are 
obtained.  In  order  to  investigate  the  degradation  due  to  these 
Doppler  shifts  (for  other  results,  especially  for  larger  Doppler 
shifts,  cf.  [3,  4]),  we  have  computed  the  generalized  POP-ratio 

.  ,,  minl«l<€ma,  ^40,  £] 

Psw\q,max)  —  ~  7  TTT  , 

maxm^o,|^|  <^max  I As1Jo[m,  <J| 

where  the  filters  iu[-]  of  length  N  =  SL  have  been  determined 
using  the  LP  technique.  For  {maa;  10~4,  the  Doppler  effect 
causes  a  noticeable  degradation  of  pSu;(£maa:)i  and  for  £maa;  = 
5-10-3,  the  loss  in  pSU;(£maaO  can  exceed  20 dB  as  shown  in 
the  table  below  that  lists  the  psto(fma*)-' values.  Especially  se¬ 
quences  with  best  noise  performance  [5]  (cf.  #1,#2,#3),  which 
also  provide  very  good  p5tfJ(0)- values,  seem  to  be  Doppler  sen¬ 
sitive.  Others  (cf.  #4)  are  less  sensitive. 


# 

L 

*[]  (hex) 

41 

pedestrian 

^mai  =  0 
£ma*=0 

car 

w301f. 

=  5-10-4 

train 

“60?3 
=  110-3 

airplane 
^  300  ■— 

=  510-3 

1 

20 

05D39 

*41 

40.070  dB 

37.97  dB 

34.64  dB 

22.00  dB 

2 

25 

073F536 

«>[•] 

40.828  dB 

37.90  dB 

33.97  dB 

20.86  dB 

3 

30 

09BF8EB5 

«’(•! 

42.408  dB 

38.35  dB 

33.84  dB 

20.34  dB 

4 

15 

2DE4 

w\-] 

30.982  dB 

30.79  dB 

30.25  dB 

23.49  dB 

5 

15 

2980 

<4] 

41.279  dB 

38.89  dB 

35.33  dB 

22.56  dB 

6 

15 

2980 

|  ^[-] 

40.100  dB 

39.11  dB 

36.26  dB 

25.33  dB 

III.  Doppler  Tolerant  Filters 

We  will  first  search  for  sequences  $[•]  with  large  p5U;(£max)- 
values  and  then  design  receiver  filters  ty[*]  of  length  N~3L 
with  optimized  Doppler  performance.  To  simplify  the  search 
in  the  first  step,  the  ambiguity  function  is  expressed  as 

\Asw[m,  £]|2=  EE  s[n]s[/]  u?[n  +  m]w[l  +  m]  cos(27 r£(n  —  /)) 

n  l 

ssC'LM  -  4ttY  (c'i2?  h  csw  h — (cii?  H ) 2 ) , 

where  we  approximated  cos(a;)  «  1  —  x1  /  2  (|a:|  <  0.1  yields 
less  than  5%  error)  and  where  ci^[m]  =  s[n]w[n  +  m]n* . 
Since  this  is  a  quadratic  equation  in  £,  only  the  cases  £  =  0  and 
£  =  £max  must  be  considered.  Moreover,  this  approximation 
leads  to  an  efficient  criterion  that  allows  an  exhaustive  search 
up  to  lengths  L  «  40. 

In  the  second  step,  we  determined  Doppler  tolerant  filters 
w[n]  by  adding  constraints  on  |  X^u4n]^[n+m]  cos(27r£  mcLx'n'j  | 
and  |  sin(27r£maajn)|,  or  on  |Cs^(m)|.  Both 

approaches  result  in  a  reduced  degradation  of  pSw(^max)  with 
increasing  £ma;c.  For  £macc  —  5  10“3,  the  improvement  of 
Psw^max )  may  exceed  3dB  (compare  #5  with  #6).  Nev¬ 
ertheless,  the  complete  degradation  of  the  pSw{imax)  caused 
by  Doppler  frequency  shifts  cannot  be  compensated  by  mis¬ 
matched  filters. 
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Abstract  —  Welch’s  lower  bounds  on  total  peri¬ 
odic  and  odd  correlation  energy  of  an  equi-energy  set 
of  sequences  are  presented-  It  is  shown  that  both 
bounds  are  simultaneously  achieved  precisely  when 
the  sequence  set  forms  an  aperiodic  complementary 
sequence  set,  which  has  been  extensively  studied  and 
is  of  independent  interests.  Then  a  lower  bound 
closely  related  to  an  approximate  SNR  formula  of 
Pursley  for  asynchronous  DS/SSMA  is  derived.  Our 
results  are  an  extension  of  the  works  of  Massey  and 
Mittelholzer  for  synchronous  DS/SSMA. 

I.  Introduction 

In  spite  of  the  fact  that  the  existing  theory  of  sequence  de¬ 
sign  concerns  mainly  with  the  maximum  periodic  correlation 
magnitude,  it  is  well-known  that  the  inter-sequence  aperiodic 
cross-correlation  energy  (i.e.  between  any  two  users)  are  more 
interesting  than  the  maximum  periodic  (or  even  aperiodic) 
cross-correlation  magnitude  from  the  pragmatic  viewpoint  be¬ 
cause  they  determine  the  average  SNR  of  an  asynchronous 
DS/SSMA  system  under  proper  assumptions  [6], [7]. 

In  order  to  maximize  the  average  SNR  of  an  asynchronous 
DS/SSMA  system  by  proper  choice  of  signature  sequences, 
sets  of  binary  sequences  are  typically  numerically  optimized 
with  respect  to  the  average  interference  parameter  (AIP), 
which  can  be  accurately  approximated  by  the  total  aperiodic 
cross-correlation  energy  (i.e.  sum  over  all  pairs  of  distinct  se¬ 
quences).  In  the  last  two  decades,  many  numerical  results 
about  binary  sequences  with  optimized  AIP  were  reported 
(c.f.  [2]  and  the  references  therein). 

Welch’s  bound  is  essentially  a  lower  bound  on  the  total- 
even-moment  of  inner  products  of  any  set  of  equi-energy  se¬ 
quences,  though  it  is  usually  formulated  as  a  bound  on  max¬ 
imum  inner-product  magnitude.  Recently,  Massey  [3]  iden¬ 
tified  the  necessary  and  sufficient  condition  for  a  sequence 
set  to  meet  Welch’s  bound  on  the  total  inner-product  en¬ 
ergy.  This  result  was  subsequently  elaborated  by  Massey 
and  Mittelholzer  [4]  for  application  in  synchronous  DS/SSMA 
systems.  In  particular,  the  uniformly  good  property  of  the 
Welch-Bound-Equality  (WBE)  sequence  sets  guarantees  that 
all  inter-sequence  inner-product  energy  of  such  sequence  sets 
simultaneously  achieve  the  same  value.  This  property  means 
that  the  use  of  WBE  sequence  set  as  the  signature  sequences 
for  a  synchronous  DS/SSMA  system  results  in  the  minimum 
worse-case  interuser  interference  variance,  and  is  very  desir¬ 
able  from  an  application  viewpoint. 

II.  Main  Results 

This  work  is  an  extension  of  the  results  of  [3]  and  [4]  to 
asynchronous  DS/SSMA  systems,  which  are  considered  to  be 
more  practical  due  to  the  removal  of  the  assumption  of  ideal 
sequence  synchronization.  The  following  theorems  state  our 
main  results. 

1This  work  was  supported  by  the  Croucher  Foundation  Fellow¬ 
ship  1994/95. 


Let  X  be  an  equi-energy  set  of  K  complex- valued  sequences 
of  length  L. 

Theorem  1  (Welch’s  bound  on  total  periodic  correlation  en¬ 
ergy)  Let  Xs  be  the  sequence  set  obtained  by  including  all  cycli¬ 
cally  shifted  versions  of  every  sequence  in  X.  Then  the  total 
inner-product  energy  of  Xs  is  at  least  K2  L3 ,  with  equality  if 
and  only  if  X  is  a  periodic  complementary  sequence  set . 

Theorem  2  (Welch’s  bound  on  total  odd  correlation  energy) 
Let  Xg  be  the  sequence  set  obtained  by  including  all  negacycli- 
cally  shifted  versions  of  every  sequence  m  X .  Then  the  total 
inner-product  energy  of  Xg  is  at  least  K2L 3,  with  equality  if 
and  only  if  X  is  a  odd  complementary  sequence  set. 


Theorem  3  (Bound  for  asynchronous  DS/SSMA)  Let  Cij(t) 
denote  the  aperiodic  cross-correlation  at  phase  shift  t  between 
the  ith  and  jth  sequences  in  X.  Then 


max 
0  <j<K 


K- 1  L  —  l 

<  E  E  l^d(*)|2  + 

i  =  0  t  =  l  — L 


L  —  l 

E  ic^wi2  ► 

t-l-L 

t/0 


>  ( K-\)L \ 


with  equality  if  and  only  if  the  sequence  set  forms  an  aperiodic 
complementary  sequence  set. 


Theorem  3  is  closely  related  to  the  approximate  SNR  for¬ 
mula  of  Pursley [6]  for  asynchronous  DS/SSMA.  Preliminary 
forms  of  Theorems  1  and  2  were  presented  in  [5].  A  discus¬ 
sion  on  binary  linear  cyclic  codes  that  almost  achieve  Welch’s 
bound  on  total  periodic  correlation  energy  can  be  found  in  [1]. 
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Abstract  — 

A  method  of  “coded  addition  of  sequences”  is  pro¬ 
posed  for  the  signal  design  with  many  codewords  for 
synchronous  or  approximately  synchronized  CDMA 
systems. 

I.  Coded  Addition  of  Sequences 

The  method  can  be  explained  using  small  examples. 

We  can  obtain  a  4-phase  good  code  of  wordlength  2 


where 

Xn  =*  [  VJ17  w3  wl  w11  w9  w19  w17  w3  ]  , 

when  L  =  1. 

The  Euclid  distance  between  any  two  among  [xn  .  ..Xu] 
is  the  same  except  for  the  case  that  these  two  are  inverse 
each  other.  On  the  other  hand,  the  crosscorrelation  function 
between  xjt-  and  X2>  is  0  for  -1,0  and  1  shift  terms. 


II.  Discussion 


Then,  from  orthogonal  vectors 

Xi  =  (  1  1  1  — 1  ) 

X2  =  (  1  1  -1  1  ), 

we  can  obtain  eight  vectors  by  “coded  addition”  of  vectors 
with  above  4-phase  code  as  follows: 


Xi  +  JX2 

=  ( 

1  +3 

1  +  i 

1  -  j 

-1  +j) 

Xi  - 

-7x2 

=  ( 

1  -j 

1  -3 

i+i 

-1  -  j) 

jxi 

+  X2 

=  ( 

1  +3 

1  +  j 

-1  +3 

1  -i) 

-jxi 

+  x2 

=  ( 

1  -i 

1  -j 

-1  -  j 

i+i) 

-Xi  - 

-  7x2 

=  ( 

-1  -j 

-1  -3 

-1  +3 

i-i) 

-Xi  +7x2 

=  ( 

-1  +j 

-1  +  i 

-1  -j 

i+i) 

-jx  1 

-  X2 

=  ( 

-i-i 

-1  -  j 

1  -j 

-1  +j) 

jx  1 

-  X2 

=  ( 

-1  +  j 

-1  +j 

1  +3 

-1  -i) 

For  a  synchronous  CDMA  system,  a  signal  design  without 
co-channel  interference  is  realized  by  using  rows  of  a  unitary 
matrix.  For  an  approximately  synchronized  CDMA  system, 
a  signal  design  without  co-channel  interference  is  also  real¬ 
ized  by  using  the  pseudo-periodic  sequences  proposed  by  the 
authorfl]. 

However,  in  real  system,  the  information  transmission  rate 
and  the  number  of  users,  which  can  use  the  system  in  the 
same  time,  are  important.  So,  a  user  should  be  assigned  many 
signals,  each  of  which  are  without  co-channel  interference  to 
the  signals  of  other  users,  so  that  the  user  can  use  the  assigned 
signals  as  codewords. 

In  this  paper,  a  method  of  “coded  addition  of  sequences” 
was  proposed  for  the  signal  design  with  many  codewords  for 
synchronous  or  approximately  synchronized  CDMA  systems. 


The  Euclid  distance  between  any  two  of  these  vectors  is  always 
4,  except  for  the  case  of  the  two  vectors  are  inverse  each  other. 
Farthermore,  all  of  these  vectors  are  orthogonal  to  both  of 

x3  =  (  1-111) 

x4  —  (  1  111). 

Above  method  of  “coded  addition  of  vectors”  also  can  be 
used  to  the  row  vectors  in  the  IDFT  matrix  in  following  for¬ 
mula  of  the  method  of  signal  making  for  approximately  syn¬ 
chronized  CDMA[1].  Because  (1  j)  is  also  an  orthogonal  se¬ 
quence,  a  formula 
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X2d] 

prepares  eight  polyphase  codewords  for  a  user,  where  w  zz 
exp(^-).  In  this  case,  the  user  1  can  be  assigned  8  pseudo- 
periodic  sequences  of  length  6  +  2 L: 


r  i  i  t  /  I  I  I  /  1 

[xn  X12  x13  XU  Xi5  x16  x17  xu], 
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Abstract  —  An  upper  bound  for  a  hybrid  exponen¬ 
tial  sum  over  Galois  rings  is  derived.  This  bound  is 
then  used  to  obtain  an  upper  bound  for  the  maximum 
aperiodic  correlation  of  some  recently  constructed 
weighted  degree  sequence  families  over  Galois  Rings. 
The  bound  is  of  the  order  of  y/LlogL  where  L  is  the 
period  of  the  sequences. 

I.  Introduction 

For  a  fried  prime  p  and  integers  e,ra,  e  >  2,  m  >  1,  let 
Re,m  denote  the  Galois  Ring  of  characteristic  pe  and  contain¬ 
ing  pem  elements.  Let  tpe,m  be  a  non-trivial  additive  charac¬ 
ter  of  Re,m  and  let  f(x)  £  Re,m[x]  be  non- degenerate  with 
weighted  degree  D f  [1].  Define  Te,m  =  Te*m  U  0  where  T*>m 
is  a  cyclic  subgroup  of  m  of  order  pm  -  1.  In  [1],  Kumar 
et  al.  prove 

^2  ^e, »»(/(*))  <  ( Df  -  1  IVp™.  (1) 

x€Te,m 

Consider  the  set  Sd  of  sequences  defined  via 
So  =  {{T«,m(/(/3*))}  tez  |  Df  <  Z)} 
where  j3  is  a  generator  of  T*>rn.  Let  the  set 
Fd  C  Sd 

consist  of  a  maximal  family  of  pairwise,  cyclically  distinct  se¬ 
quences  in  Sd  with  each  sequence  having  period  2m  —  1.  Using 

(1),  it  is  easy  to  see  that  the  maximum  non-trivial  correlation 
Gmax  of  the  family  Td  has  the  upper  bound 

Gmax  ^  1  +  ( D  —  l)y/pm . 

The  family  TD  compares  very  well  with  existing  sequence 
families  when  Cmax,  alphabet  size  and  family  size  are  used 
as  a  basis  for  comparison.  In  this  paper,  we  obtain  an  upper 
bound  to  the  maximum  aperiodic  correlation  of  the  family  TD. 
The  aperiodic  correlation  is  often  more  relevant  than  periodic 
correlation  in  CDMA  applications. 

II.  Bound  on  a  hybrid  exponential  sum 

Let  f(x)  e  Re,m[x]  have  weighted  degree  Df.  Let  \e,m 
be  an  arbitrary  multiplicative  character  with  order  dividing 
p177  —  1.  Using  L- function  techniques,  we  can  express  the  hy¬ 
brid  exponential  sum  Y2xeTe  m  as  a  sum  of 

Df  complex  numbers.  These  complex  numbers  can  be  shown 

2The  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  Grant  Number  NCR-93-05017  and  in  part  by  the  Nor¬ 
wegian  Research  Council  for  Science  and  the  Humanities. 


to  be  the  reciprocal  roots  of  the  zeta  function  of  a  function 
field  over  Fpm.  It  follows  from  the  Riemann  Hypothesis  for 
function  fields,  that  the  magnitude  of  each  of  these  complex 
numbers  is  ^/p™.  Thus,  we  have 

Theorem  1 

^  ^  D f\J pm . 

III.  Bound  on  aperiodic  correlation 

The  aperiodic  correlation  #ij2  (r)  between  any  two  pe-ary 
sequences  $i(t)  and  s2(t)  of  period  A,  is  defined  via 

min{N— 1 ,  jV  —  1  —  r  } 

0i, 2(r)=  Y2  *Sl(t+T)-S2{t),  u  =  exp{i2n/pe). 

t—max{0,  —  r} 

The  computation  of  the  aperiodic  correlation  distribu¬ 
tion  6i,j{r),  1  <  r  <  A,  of  To  reduces  to  ob¬ 
taining  the  distribution  of  the  exponential  sum  values 

{Er=S wUT}  M/m  1  d,  <  d}. 

Using  Theorem  1,  and  using  similar  techniques  as  in  [2] 
(see  also  [4],  [3]),  we  can  bound  the  maximum  non-trivial 

aperiodic  correlation  Qmax  of  To  as  under 

Theorem  2 

|  $  max  |  <  D\Jp™  { In  (pm)  +  1). 
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Abstract  —  A  new  method  for  designing  signals  with 
a  given  time-bandwidth  product  is  introduced.  These 
signals  in  the  set  have  a  flat  amplitude  spectrum  and 
have  low  cross-correlation  function  values  and  lie  on 
a  signal  parameter  space  ellipse.  Upper  bounds  for 
the  cross-correlation  between  signals  in  the  set  is  de¬ 
rived. 


I.  Introduction 

There  are  many  applications  where  there  is  a  need  for  syn¬ 
thesizing  signal  sets  which  have  low  values  of  cross-correlation 
at  all  lags  and  low  values  of  autocorrelation  at  nonzero  lags. 
While  prolate  spheroidal  functions  are  “essentially”  time  and 
band-limited,  and  are  orthogonal,  the  cross-correlation  be¬ 
tween  the  signals  is  not  zero  for  all  lags  [1].  In  an  asynchronous 
system  with  no  cooperation  among  users/targets,  uniformly 
low  cross-correlation  values  between  signals  are  important.  In 
an  imaging  context,  using  signals  whose  spectrum  is  broad 
enough  to  cover  the  nulls  in  the  backscattering  spectrum  of 
targets  ensures  reasonable  signal  to  noise  ratio  [2]. 

II.  Design  Constraints 

Let  S  =  {si(t),s2(t),...,sN(t)}  be  a  set  of  complex  envelopes 
of  signals  which  are  y)  with  a  corresponding  set  of 

Fourier  transforms  S  =  {Sy  (/),  *52  <Sw(/)}>  The  design 

specifications  are  as  follows: 

Condition  1: 

st{t)  |2  d.t  =  1;  »  = 


/: 


(i) 


Condition  2:  For  some  k  >  0  and  for  all  r  <  T 


(2) 


where  the  cross-correlation  Ri}J(r)  between  signals  Si(t)  and 
Sj(t)  is  defined  as 

f+T 

Rij(r)  =  j  5i(W-r)dt;  r  <T 
Condition  3:  For  i  =  1,2,...,N 


\Sl{f)  |= 


Oil 

012(f) 


f\<W 

f\>W 


(3) 


where  oq  is  a  constant  and  012(f)  is  positive  function.  Let 
on  =  -  61  and  012(f)  =  62 ,  where  6U  S2  >  0  are  very 

small,  such  that  f™w  \  Si(f)  |2  df  =  1  -  c.  The  signals  are 
“essentially”  band-limited  with  the  amplitude  of  the  Fourier 
transform  as  specified. 

Since  the  area  under  the  squared  magnitude  of  the  cross- 
correlation  function  is  fixed  because  of  (3),  it  can  be  reasoned 
that  the  cross-correlation  function  should  be  a  constant  func¬ 
tion  with  a  support  [-T,  T]  to  achieve  uniformly  low  values  of 
cross-correlation. _ 

1  This  work  was  supported  by  the  National  Science  Foundation 
under  grant  OCE  89-14300 


III.  Solution  to  the  Design  problem 

It  has  been  shown  heuristically  that  for  signals  with  quadratic 
phase  functions  in  the  time  and  frequency  domains  the  shape 
of  the  complex  envelopes  will  be  rectangular  [3].  Let 


Si(f)  =1  Si(f) 


e>(at/2  +  6,/  +  cl) 


(4) 


By  selecting  the  quadratic  coefficients  carefully  we  can  also 
ensure  that  the  difference  between  the  phase  functions  of  two 
signals,  which  determines  the  cross-correlation  property,  is 
quadratic.  To  arrive  at  a  rule  to  pick  the  quadratic  coeffi¬ 
cients  the  usual  definitions  of  the  rms  duration  7  and  rms 
bandwidth  /?  are  used  [3].  Using  these  definitions,  it  can  be 
shown  that  the  quadratic  coefficients  lie  on  an  ellipse,  i.e., 


=  1 


(5) 


IV.  Upper  Bound  for  Cross-Correlation 

Let  the  real  and  imaginary  parts  of  a  Fresnel  integral  be 
C(x)  —  f*  cos(^Y~)dt  and  S(x)  =  f  *  sin (Zf^)dt.  Rl,J  is 
a  continuous  function  of  r,  Aa,  A6  and  Ac,  where  Aa  = 
at  -  aj ;  A b  =  bt  -  b3\  A c  =  c*  -  c3.  Since  Ri,3(r)  is 
a  continuous  function  so  is  |  Rlt3(r)  |.  This  means  that 
maxr<T  |  Ri,j(r)  |  exists  and  is  finite. 

Theorem :  maxjr|<T  |  Ri,j(r)  |<  2. 3(Ap  v/ife) 

Proof: 

I  RiAr)  1=  2^  \f )+jS(x1  )-C(xo)-jS(xo)]  (6) 

where  _ 

*i  =  yS7((2 *r  +  Ab)  +  W/\a)  and  x0  =  + 

A b)  -  VKAa).  maxIOiI1  e(-os, +<»)[!  C(®i)  -  C(x0)  |]  <  1-6. 
Also,  maxIOiJl  c(_oo,+oo)[|  <S(*i)  -  S(x0)  |]  <  1.6.  Thus 

|  C(xi)  +  jS(xi)  —  C(xo)  —  jS(xo)  |<  2.3 

Corollary:  For  a  given  duration  and  bandwidth  for  the  signal 
set ,  the  cross- correlation  between  two  signals  that  are  furthest 
apart  along  the  semi-minor  axis,  in  the  set  is  bounded  by  ^ ~ . 

It  can  be  shown  that  it  is  possible  to  trade-off  the  num¬ 
ber  of  signals  on  the  signal  parameter  ellipse  for  better  cross¬ 
correlation  properties  between  signals  in  the  set. 
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ON  GROBNER  BASES  OF  THE  ERROR-LOCATOR  IDEAL  OF  HERMITIAN  CODES 
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1.  ERROR-LOCATOR  IDEALS  FOR  HERMITIAN  CODES 
Consider  error-correcting  codes  constructed  from  an  affine  ver¬ 
sion  of  the  Hermit ian  curve.  Let  K  —  OF(q)  and  let  m  = 
\Jq  +  1  be  an  integer.  In  this  case  the  affine  version  of  the  Her- 
mitian  curve,  C{x,y)  =  x  +  xm~1  —  ym,  is  irreducible,  regu¬ 
lar,  and  has  exactly  n  =  rational  points,  given  by  Pn  = 

{(^l  >  Vi )?  (x2,y2), . ,  (a?ni  Vn )}■  The  genus  g  of  this  curve  is  given 

by  g  =  (m  —  l)(m  —  2)/2.  The  total  degree  ordering(TDO)  <t  of 
the  pairs  (a,b)  of  the  positive  integers  is  chosen  as  follows  : 

(0,0)  <t  (1,0)  <t  (0, 1)  <t  (2,0)  <t  (1, 1)  <t  (0,  2)  <t  •  * 

In  the  TDO  let  j  be  a  positive  integer  such  that  m  —  2  <  j  < 
and  let  4>o(x,y)}  <t>i(x,  y <fru(x,y)  denote  the  monomials  x ayb  for 
(a, 6)  <t  (0 ,  j).  The  Kermitian  code  C  is  then  defined  by  its  parity 
check  matrix  H  : 

4>o(xi,yi)  •••  <t>o(xn,yn)  \ 

4>\{x\,y\)  4>\{xn,Vn)  \ 

:  •  (D 

4>u(x\,y\)  <t>u(xn,yn)  / 

The  dimension  and  the  designed  distance  of  the  code  C  satisfy  k  = 
n  —  ( mj  —  g  +  1)  and  d *  =2  mj  —  2g  +  2  <  d,  respectively,  where  d 
denotes  the  true  minimum  distance  of  the  code  C. 

In  the  decoding  situation  a  received  word  r  is  the  sum  of  a  code¬ 
word  c  and  an  error  vector  e.  The  syndrome  vector  s  is  com¬ 
puted  as  usual  by  s  =  tHt  .  Assume  that  v  =  u;t(e)  <  t,  where 
t  =  [(d  —  1)/2J.  Also,  assume  that  an  error  which  occurs  in  the 
i-th  coordinate  of  r  is  denoted  by  e;(^  0).  Then  the  error-location 
set  of  e  is  defined  by  EPxy  =  {(. xiyyi )  :  i  G  Zn  and  a  ^  0}, 
where  Zn  =  {t  :  1  <  i  <  n}.  It  follows  from  (1)  that  sa b  = 
J2teze  eix?Vi  f°r  a  +  ^  <  j  are  the  known  syndromes  for  the  er¬ 
rors  of  the  Hermitian  code  C ,  where  =  { i  :  i  G  Zn  and  e;  ^  0}  is 
called  the  error- location  index  set.  The  decoding  problem  is  to  use 
these  syndromes  sab  to  determine  the  v(<  t)  error  positions  (£t,y;) 
and  the  corresponding  error  values  et  for  i  £  Ze. 

Usually,  the  determination  of  the  error  positions  is  based  on  the 
observation  that  if  any  polynomial,  f(x,y)  =  Ylv+r V<hfvv>xvyw, 
has  the  same  error  positions  as  the  received  word  among  its  zeros, 
then  fvwSa+v,b+w  =  0.  This  implies  that  the  procedure 

for  determining  the  error  positions  is  independent  of  the  method 
needed  to  find  the  error  values.  The  error-locator  ideal  of  e  is  defined 
next. 

Definition  1  The  'polynomial  ideal, 

=  {/(x,y)  €  K[x,y]  :  f(xi,yi)  -  0  for  all  i  G  Ze}, 

is  called  the  error-locator  ideal  of  the  error  vector  e. 

2.  DETERMINING  GROBNER  BASES  OF  THE  ERROR- 
LOCATOR  IDEAL 

For  brevity,  define  the  following  polynomials  : 

U  =  E,X°Y»  +  E2X?Y2b  +  ...  +  EvX;Yvb  -  sab,  (2) 
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hj  —  c(XjtYj)t  (3) 

h3  =  X]  -  Xj9  l2j  =  Yf  -  Yj,  lzj  =  E]~l  ~  1,  (4) 

over  the  set  of  variables  Xj ,  Yj ,  Ej  for  1  <  j  <  v.  For  a  received 
word  r  =  c  +  e  with  v  =  it/f(e)  <  f,  the  problem  of  decoding 
Hermitian  codes  is  equivalent  to  solving  for  the  common  zeros  of  the 
following  set  of  multivariate  non-linear  equations  :  fab  =  0  for  a  + 
b  <  j,  and  the  equations,  hj  =  0,  hj  =  0,  l2j  =  0,  lZj  =  0  for 
;  =  lf2,...,v. 

Consider  the  polynomial  ring  K[Xu  Y1 ,  Ei,  ...,XV,YV,EV]  and 
the  following  set  of  polynomials  :  F  =  U  T2  U  ^3 ,  where  the  sets 
Tj  are  given  by  =  {fab  :  a  +  6  <  j},  X2  =  {hj  :  1  <  j  <  v},  and 
^*3  =  {hj  :  1  <  *  <  v,  1  <  j  <  v}  with  the  polynomials  fabj  hj  and 
hj  being  defined  by  (2), (3)  and  (4),  respectively.  Thus,  the  problem 
of  decoding  Hermitian  codes  is  equivalent  to  a  determination  of  the 
variety  U(F)  or  its  equivalent  V(I(F)).  The  key  observation  is  the 
following  relation  between  the  ideal  7(F)  and  the  error-locator  ideals 
Ie(Xj,Yj)  : 

Theorem  1  7(F)  n  K[Xj,Yj]  C  Ie(Xj,Yj)  for  j  =  l,2,...,v,  and 
V(h(XjtYj))  =  V{I{F)PK[X3,Y,])  for  j  =  1,2, 

In  order  to  solve  for  the  error- locations  from  the  error-locator 
ideal  Ie(Xj,Yj),  one  needs  to  determine  a  set  of  generators  for  this 
ideal.  First,  define  the  projection  sets  EPX  —  {a  :  ( a,/3 )  G  EPxy} 
and  EPy  =  {/3  :  (oc,fi)  G  EPxy }.  Next,  define  the  “purely  lex¬ 
icographical”  (P  LEX)  ordering  of  the  m-tuples  (aj ,  a2, ...,  am)  as 
follows  :  (0,0,..., 0)  <p  (1,0,  ...,0)  <p  (2,0,...,0)  <P  -  *  ■  <p 

(0,1,..., 0)  <p  (0,  2,  ...,0)  <p  ■  •  Theorem  1  implies  the  follow¬ 
ing  important  theorem  for  the  normalized  reduced  Grobner  ba- 
sis(NRGB)  of  7(F)  : 

Theorem  2  Let  Gp  be  the  NFLGB  of  7(F)  w.rA.  PLEX  order¬ 
ing  exponents  of  the  monomials  Xf'Yf3  F"3  ...X^-^Y^1  F?3v . 
Then  Gp  n  K[X\,Y\]  =  {g2(X\ ,  Y\  ),g\  (X\ )}  and  V(GP  n 

K[Xi,Yi])  =  EPxy,  where  g2(x,y)  G  K[x,y]  and  gi(x)  G  K[x]. 

The  above  theorems  provide  an  approach  for  producing  from  7(F)  a 
minimal  set  of  generators  for  the  ideal  7(F)  n  K[Xj,Yj].  Following 
this  approach,  a  decoding  method  based  on  Buchberger’s  algorithm 
[4]  is  developed  as  follows: 

Decoding  Method  : 

(1)  Initialize  :  Give  F  and  set  v  =  0. 

(2)  Set  v  =  v  +  1  and  apply  the  Buchberger  algorithm(w.r.t.  PLEX 
ordering)  to  F. 

(3)  If  |V(F)|  =  0  and  v  <  t,  goto  (2);  otherwise,  find  GpnA’[Jfliyi]l 
where  Gp  is  the  set  of  generator  polynomials  obtained  by  Buch- 
berger’s  algorithm. 

(4)  Determine  the  error  positions  by  solving  Gv  nTGXVyil  for 

V(GpnK[XltYi]). 

(5)  Solve  for  the  error  magnitudes  ej. 
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Summary 

In  this  paper,  a  new  approach  to  determine  a 
lower  bound  for  the  generalized  Hamming  weights 
of  algebraic-geometric  (AG)  codes  is  discussed. 

Let  LS  be  a  location  set  and  let  H  k  \  h,, 
h„  i-  be  a  well-behaving  sequence  of  monomials 
based  on  LS.  Let  I\  h  ...  h  ^  be  a  subset  of  LS 
called  a  maximal  partially  linearly  dependent 
location  set ,  on  which  hr  is  consistently  and  par¬ 
tially  linearly  dependent  on  its  previous  monomi- 
als.  Define  D\ „rj , ... ,  Kp  }  =  I  I\  h,, ,  - ,  nrp  1  I  •  It  is 
called  the  consistent  dependent-degree  of  mono¬ 
mials  hr  ,  ••• ,  hr  .  We  define  Dir)  k  max 
■>  h,,  ,  h,2  ,  .  hif  1  I  I  —  i\  <  *2  <  <  ip  —  r  K 

Theorem:  For  a  linear  code  Cr  defined  by  Hr  = 
[hj,  h2,  hr  ]T ,  if  there  is  some  d*  such  that 
Drr-<t*+h  +  i  <  d*  -  1,  then  the  generalized  Ham¬ 
ming  weight  dh  is  equal  to  or  greater  than  d*. 

Thus,  the  determination  of  a  lower  bound  of 
the  generalized  Hamming  weights  reduces  to  the 
calculation  of  D(pr).  Using  an  improved  Bezout 
theorem,  for  the  AG  codes  defined  by  a  large  class 
of  plane  curves,  the  value  of  D(p  can  be  easily 
determined.  In  the  following  we  show  one  exam¬ 
ple.  Let  the  curve  be  a  Hermitian  curve  over 
GF(24):  x5  +  y4  +  y  =  0  .  We  have  the  follow¬ 
ing  well-behaving  sequence  H : 

H  =  \  1,  x,  y,  x2,  Ay,  y2,  x3,  x2y,  xy2,  y3,  x4, 
x3y,  x2y2,  xy3,  x5,  x4y, x3y2,  x2y3,  ...  M  xly>  I  0 
<  /  <  15,  0  <;'<  3  K 

Let  us  consider  C 16,  i.e.,  r  =  16.  The  first  16 
monomials  are  as  follows:  \  1,  x,  y,  x2,  xy,  y2,  x3, 
x2y,  xy2,  y3,  x4,  x3y,  x2y2,  xy3,  x5,  x4y  r.  Using 
the  calculation  of  D(pr\  we  have  the  following 
values. 

D(,16>  =  21  D(216)  =  17  D(3l6)  =  16  Z)V6>  =  13 


Z)!s16)  =  12  Dl16)  =  10  M16)  =  9  Dil6)  =  8 

Di16)  =  7  D\o6)  =  6  D)i6)  =  5  =  4 

Z)(^6)  =  3  d{\6)  =  2  D(^6>  =  1  D\l66)  =  0. 

From  these  values  and  the  above  theorem,  we 

have  rfi(C,6)  >  12,  d2(Cl6)  >  15,  d3(C16)  >  16, 
d4(C16)  >  19,  d5(C i6)  >  20,  d6(Cl6 )  >  21,  rf7(C16) 
>  23,  and  dh(Cl6)  >h  +  16,  for  h  =  8,  9,  10,  11, 
...,  48. 

Using  this  new  approach,  some  more 
efficient  linear  codes  with  the  minimum  distances 
4,  5,  6  and  any  lengths  over  GF(2m ),  and  some 
more  efficient  AG  codes  have  also  been  con¬ 
structed  in  this  paper. 
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Recently  fast  decoding  methods  ([1]  [2]  [3],  etc.)  of 
algebraic-geometric  (AG)  codes  have  been  proposed  as  appli¬ 
cations  of  Sakata  algorithm  (the  multidimensional  Berlekamp- 
Massey  algorithm)  [4].  Similar  but  distinct  fast  decoding  al¬ 
gorithms  have  been  presented  by  [5]  [6],  etc.  Among  them, 
[3]  [5]  [6]  give  fast  decoding  methods  for  generic  one-point  AG 
codes  from  any  algebraic  curves  in  the  projective  space.  These 
methods  are  more  efficient  than  the  original  Feng-Rao  decod¬ 
ing  method  [7].  In  particular,  [3]  is  concerned  with  the  mul¬ 
tidimensional  syndrome  array  instead  of  with  the  syndrome 
matrix,  and  employs  a  unique  scheme  of  majority  logic  to 
find  the  unknown  syndrome  values  necessary  for  decoding  up 
to  half  the  Feng-Rao  bound  (designed  distance)  <1fr  in  the 
framework  of  Sakata  algorithm,  where  dpR  is  greater  than  or 
equal  to  the  Goppa  bound  do  in  general  [8]. 

To  improve  the  probability  of  correct  decoding,  it  is  desir¬ 
able  to  devise  an  efficient  decoding  algorithm  which  can  cor¬ 
rect  both  errors  and  erasures.  Skorobogatov  and  Vladut  [9] 
were  the  pioneers  of  erasure-and-error  decoding  of  AG  codes. 
Their  method  can  correct  t  errors  and  r  erasures  such  that 
2t  +  r  <  dG  —  g,  where  g  is  the  genus  of  the  curve  difining 
the  AG  code.  Extending  their  error-only  decoding  method 
[7],  Feng  and  Rao  [10]  gave  an  erasure-and-error  decoding 
method  which  can  correct  t  errors  and  r  erasures  such  that 
2 1  +  r  <  dFR. 

In  this  paper  we  propose  a  fast  erasure-and-error  decod¬ 
ing  method  based  on  a  unification  of  our  error-only  decoding 
method  [3]  and  the  algorithm  [l  l]  for  finding  a  minimal  poly¬ 
nomial  vector  set  of  a  vector  of  multidimensional  arrays.  Our 
main  concern  is  how  to  find  the  unknown  syndrome  values  and 
the  error  locations  in  addition  to  the  given  erasure  locations 
more  efficiently  than  the  Feng-Rao’s  scheme  based  on  matrix 
calculations  [10]. 

We  take  a  one-point  AG  code  (over  a  finite  field  K)  C  := 
{(ci,  •  ■  •  ,c„)  G  iT*|  X)”=i  Cjf(Pj)  =  0,  /  £  T(mi:>cc)}  from  an 
irreducible  nonsingular  projective  curve  C,  where  X(mFoo)  is 
a  linear  subspace  of  the  algebraic  function  field  K(C)  which  is 
composed  of  functions  /  having  a  single  pole  of  order  o(f)  <  m 
at  Poo.  In  fast  decoding  of  AG  codes,  we  manipulate  two 
kinds  of  entities,  i.e.,  functions  /  (£  K[C)  :=  Um>0T(mF>Oo)) 
(treated  as  multivariate  polynomials)  and  multidimensional 
syndrome  arrays.  For  our  purpose,  a  kind  of  vectoral  nota¬ 
tion  or  data  structure  is  crucial.  That  is,  while  we  can  rep¬ 
resent  a  multidimensional  (error  or  erasure)  syndrome  array 
u  as  an  array  vector  («^\  •  ••  having  A  component  ID 

arrays  u^l\  1  <  i  <  A,  we  represent  each  (error  locator  or 
erasure  locator)  polynomial  (i.e.,  function)  /  as  a  polynomial 
vector  (/(1),  •  •  *  , /^)  having  A  component  univariate  polyno¬ 
mials  1  <  i  <  A,  where  A  is  the  smallest  nonzero  nongap 
(pole  order)  of  functions  /  £  K\C\.  Including  an  algorithm 
(Algorithm  1)  similar  to  that  presented  in  [1 1],  we  can  con- 
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struct  a  fast  erasure-and-error  decoding  algorithm  consisting 
of  two  stages.  In  the  first  stage,  we  find  a  system  of  A  era¬ 
sure  locator  polynomial  vectors  by  applying  Algorithm  1  to 
the  erasure  syndrome  array  vector,  and  by  using  its  result, 
we  modify  the  errata  (i.e.,  error  plus  erasure)  syndrome  ar¬ 
ray  vector.  Then,  in  the  second  stage,  we  can  find  unknown 
errata  syndrome  values  by  invoking  a  kind  of  majority  logic 
for  the  modified  errata  syndrome  array  vector  with  the  aid  of 
Algorithm  1,  and  finally  we  obtain  a  system  of  A  errata  lo¬ 
cator  polynomials.  The  computational  complexity  is  of  order 
0(\n2). 
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The  (64,32,27)  Hermitian  Code  and  Its  Application  in  Fading  Channels 

X .  Chen  and  LS  Reed 


Introduction 

In  a  trellis-coded  modulation (TCM)  scheme,  a  transmitted  message  is 
determined  by  the  current  received  bit  and  a  number  of  previously 
received  bits.  Therefore,  if  the  decoder  makes  a  mistake,  errors  have 
the  possibility  of  propagating.  Such  a  propagation  of  errors  is 
considered  to  be  a  drawback  in  some  channels  such  as  mobile  radio 
channels  with  slow- shadowing  fading.  In  such  a  case  errors  due  to 
the  shadowing  can  affect  the  decoding  process  of  the  symbols  within 
an  unshadowed  time  period  and  lead  to  long-term  error  propagation. 
Thus,  in  such  a  scenario,  block-coded  modulation (B CM)  may  have 
advantages  because  the  decoding  of  a  received  code  block  is 
independent  of  any  other  blocks.  The  commonly  used  BCM  schemes 
included  the  extended  Reed-Solomon(RS)  codes  combined  with 
M-ary  Phase-Shift  Keying  (MPSK)  signaling  for  the 
bandwidth-limited  fading  channels.  For  example,  the  (16,8,9) 
extended  RS  code,  defmed  over  GF(24)  ,  is  coded  with  a  16-PSK 

signal  set.  In  recent  years  one  of  the  most  exciting  developments  in 
the  field  of  error  correcting  codes  is  the  construction  and  decoding  of 
algebraic  geometry  (AG)  codes.  It  is  shown  in  [1]  that  a  sequence  of 
Hermitian  codes  can  be  found  by  the  use  of  results  from  AG  which 
generalize  the  original  construction  of  the  RS  codes.  It  is  proved  that 
under  certain  conditions  that  there  exist  "good”  codes  within  this 
class  of  codes.  Further,  as  an  example,  van  Lint  and  Springer  claim 
for  any  practical  channel  that,  the  specific  AG  code,  namely  the 

(64,32,27)  Hermitian  code,  has  a  considerably  better  performance 
than  the  corresponding  (16,8,9)  extended  RS  code.  In  this  paper, the 

(64,32,27)  Hermitian  code  and  its  application  in  fading  channels  is 
discussed. 

Definition  and  Encoding  of  the  (64,32,27)  Hermitian  Code 
To  construct  the  (64,32,27)  Hermitian  code,  consider  the  Hermitian 
curve  of  degree  m  «=5,i.e.C(x,y)  =x5  +  y4  +  y  =  0  over  GF( 24).  This 
curve  has  exactly  n  **  64  rational  points  and  the  genus  of  this  curve  is 
£  =  6.  The  set  of  these  rational  points  can  be  computed  from  the 
cyclic  group  of  order  15,  generated  by  the  irreducible  polynomial 
ji(x)  =  .r4  +x+  IDenote  this  set  byP64  =  {(*i,yi),  (*2,72),  -..,(*64^64)] 
.  Since  64  >8x5,  a  Hermitain  code  C  can  be  defmed  by  its  parity 
check  matrix  as  follows  : 

§l(Xl,yi)  .  <t>l(*64,y64) 

(hfrltjl)  .  <t>2(*64,y64) 

//=..., 

,  <t>32(*l,yi)  .  4>32(^64,yw) 

where  (j)i(x,y),<t>2(x,y),  ...,<|>32(*,y)  denote  the  monomials  x°yb  for 
(a,  b)  <,t  (0, 8]  and  a  <  5  with  <.t  being  the  total  ordering.  Therefore, 
the  dimension  of  C  is  k  »  32.  The  designed  minimum  distance  of  this 
code  is  defmed  to  be  d*  =  32  —  6  +  1  =  27.  The  true  minimum 
distance  d  of  C  satisfies  d^d*  since  £-6 <32.  Finally  the  result 
d=d*  =27  is  determined  from  Theorem  5  of  [1].A  transform 
encoding  method  of  the  (64,32,27)  Hermitian  code  C  is  given  by  the 
following  theorem  : 

Theorem  1  Let  c(x, y)  =  ;(x,  y),  where  for  i  - 1, 2, ..., 32  the  mt 

are  the  message  symbols.Then  c  =  (c(*i,y  1),  cfeyz),  c(x^,y^). 

is  a  codeword  in  C . 


This  theorem  can  be  proved  by  Theorem  1  in  [1]  from  the  fact  that  the 
code  C  defmed  above  is  a  self-dual  code.  A  method  for  recovering  the 
message  symbols  is  considered  next.  Let  c-  (ci,c2,..., C64), 
a  e  GF( 24)  be  the  codeword  encoded  by  the  above  encoding  method, 

and  leW,  =  £,“  cjxfyJ^'^j.yj),  for  i- 1.2 32, where  the  (x„y,) 

and  <( >,-(jt/,yj)  are  defmed  as  above.  Then,  the  following  theorem 
holds : 

Theorem  2  The  message  symbol  vector  m  of  the  codeword  c  satisfies 
m=  (dud2,  ...,<^32)  • 

This  theorem  is  verified  readily  by  a  computer  search. 

Successive-Erasure  Minimum-Distance  Decoding 
of  the  (6432,27)  Hermitian  Code 
A  fast  error-only-decoding  algorithm  for  the  Hermitian  codes  is 
developed  in  [2]  by  Feng  and  Rao.  Then  the  Feng-Rao  algorithm  is 
generalized  to  an  error-and-erasures  decoding(EED)  in  [3].  In  this 
paper, a  new  Successive-Erasure  Minimum-  Distance  Decoding  for 
the  (64,32,27)  Hermitian  code  is  discussed.  Let  r«(ri,  r2,...f  r^)  be 
the  received  word  corresponding  to  c .  Consider  the  decoder  be  a  full 
maximum-likelihood  detector  which  stores  all  16  likelihood  functions 
for  each  received  symbol  riy  i.e.  p(rt\cj)  for  all  /,  where  ?(•!•) 
denotes  the  conditional  probability  density  function.  Based  on  this 
information  the  detector  provides  an  estimate  c,  for  each  r,  such  that 
p(n\cj)  is  the  greatest.  Next  define  an  estimate  of  the  log  likelihood 

ratio  to  be  L(c,)  =  In  Cf(r' .  Then  the  decoding  algorithm 
LcjtCfPirw) 

can  be  summarized  as  follows  : 

Algorithm  (1)  Succesively  erase  pairs  of  symbols  with  the  lowest 
L(ci)  values  and  apply  the  Feng-Rao  EED  algorithm  to  the  estimated 
word  c  “  (c i,  C2, ...» C64)  with  erasures.  (2)  Iterate  (1)  ^  **  14 
times.  During  each  iteration  of  this  process  an  estimate  of  the 
transmitted  codeword  is  obtained  and  stored.  (3)  The  decoder  chooses 
that  single  codeword  for  which  p(r\c)  is  the  greatest, i.e.  the 
codeword  closest  to  the  received  vector  r  in  likelihood  distance. 

The  (64,32,27)  Hermitian-Coded  16-PSK  Scheme 
A  block  coded  MPSK  scheme  is  developed  by  combining  the 

(64,32,27)  Hermitian  code,  define d  over  GF(24),  with  a  24-PSK 
signal  set.  In  this  combination  the  rate  of  the  coded  scheme  is  the 
same  as  the  uncoded  24-PSK.  But  the  time  diversity  of  the  coded 
scheme  is  determined  by  the  minimum  Hamming  distance  ( d  -  27)of 
the  code.  Since  the  minimum  Hamming  distance  of  the  (64,32,27) 
Hermitian  code  is  much  larger  than  the  corresponding  (16,8,9) 
extended  RS  code,  a  high  coding  gain  is  expected  for  the  new 

(64,32,27)  Hermitian-coded  16  PSK  scheme.  In  evaluating  the  error 
bounds  of  this  coded  scheme  on  a  Rayleigh  fading  channel  at  the 
bit-error  rates  around  10~5,  more  than  a  26  dB  coding  gain,  compared 
to  uncoded  QPSK,  is  obtained  by  the  use  of  the  new 
successive-erasure  minimum-distance  decoder. 
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Abstract  —  A  new  construction  of  linear  codes  from 
algebraic  curves  is  introduced.  In  essence,  the  con¬ 
struction  is  of  the  BCH  type,  namely,  it  is  to  extend 
the  method  of  constructing  BCH  codes  to  the  con¬ 
struction  of  codes  from  algebraic  curves.  As  a  conse¬ 
quence,  a  new  class  of  codes  is  constructed  without 
relying  much  on  algebraic  geometry.  A  comparison 
to  algebraic-geometric  codes  from  Hermitian  curves 
showed  that  our  codes  typically  have  much  larger 
minimum  distance  at  higher  code  rate.  In  partic¬ 
ular,  compared  to  Hermitian  codes  on  H( 2tt),  which 
have  length  23a,  then,  at  higher  code  rate,  our  codes 
have  minimum  distance  at  least  2^a/4J  times  greater 
than  that  of  the  Hermitian  codes.  Examples  have 
also  shown  that,  for  the  same  code  length  and  de¬ 
signed  minimum  distance,  our  codes  can  have  higher 
dimension  compared  to  codes  constructed  from  the 
approach  given  by  Feng  and  Rao. 

Constructing  linear  codes  from  algebraic  curves  is  a  rel¬ 
atively  new  technique  for  obtaining  codes  of  better  rate  or 
higher  minimum  distance,  as  well  as  codes  of  longer  length. 
It  was  proved  by  Tsfasman,  Vladu^  and  Zink  [1]  that  from  al¬ 
gebraic  curves  a  sequence  of  codes  which  exceeds  the  Gilbert- 
Varshamov  bound  can  be  constructed  using  Goppa’s  construc¬ 
tion.  Codes  constructed  from  Goppa’s  approach  is  now  called 
algebraic-geometric  (AG)  codes.  Lately,  much  work  has  been 
done  toward  non-algebraic-geometric  or  simplified  construc¬ 
tion  of  AG  codes  [2,  3,  4].  Most  recently,  based  on  their 
simplified  approach  of  AG  codes,  Feng  and  Rao  constructed 
improved  AG  codes  [5]. 

In  this  paper,  a  new  method  constructing  of  linear  codes 
from  algebraic  curves  is  introduced.  In  essence,  the  construc¬ 
tion  is  of  the  BCH  type,  namely,  it  is  to  extend  the  method 
of  constructing  BCH  codes  to  the  construction  of  codes  from 
algebraic  curves.  As  a  consequence,  a  new  class  of  codes  is  con¬ 
structed  without  relying  much  on  algebraic  geometry.  A  com¬ 
parison  to  algebraic-geometric  codes  from  Hermitian  curves 
showed  that  our  codes  typically  have  much  larger  minimum 
distance  at  higher  code  rate.  In  particular,  compared  to  Her¬ 
mitian  codes  on  H( 2a),  which  have  length  23a,  then  at  higher 
code  rate,  our  codes  have  minimum  distance  at  least  2^a^4-* 
times  greater  than  that  of  the  Hermitian  codes.  Examples 
have  also  shown  that,  for  the  same  code  length  and  designed 
minimum  distance,  our  codes  can  have  higher  dimension  com¬ 
pared  to  codes  constructed  from  the  approach  given  by  Feng 
and  Rao  [5]. 

A  brief  description  of  the  construction  follows: 

Let  a  be  a  primitive  element  of  GF(q2).  For  i  =  0,  .  .  .  ,  q2  — 
1,  denote  a0  =  0,  au  =  at_1,  for  i  >  1.  Let  ft.i, . . .  tf3i,q  be 
the  q  solutions  of  yq  +  y  —  a?-1-1  over  GF(g2).  Then  we  have 

distinct  pairs  (o:t,/A')>  which  correspond  to  all  the  rational 

1This  work  was  supported  by  the  National  Science  Foundation 
under  Grants  NCR-9406043. 


points  of  the  Hermitian  curve  H(q):  Z79+1  +  Vq+l  -f  V7g+1  =  0, 
except  a  point  at  infinity.  Let  n  be  an  integer  0  <  n  <  g3  and 
9  —  [n/q2\)  then  n  =  0q2  -f  f.  We  denote  every  x  E  GFn(q2) 
by  •  ■  •  ,  ®q2_lfll  So, 2,  .  .  -  ,  Z<,2_1,2)  ....  250,0,  .  •  ■  ,  Vq2-lt&)  if 

£  =  0,  and  by  (z0>i, . . . ,  25ga_1(1 , . . . ,  zo,e+i , . . . ,  a^-i.e+i) 
if  f  ^  0.  Let  8  be  a  positive  integer.  Define 

Rv  :=  {0, 1, . . . ,  I —  I  —  1}  for  v  <  8 

and 

Sn 

R  :=  v  -  l)|u  E  A},  where  8n  =  min{6,  fn/g2]}. 

V  =  1 

Then,  we  define  the  following  linear  code: 

fW<?2l 

C(n,  6)  :=  {c  e  GFn( g2)\  £  £  =  0,  („,  „)  e  R} 

i= 1  t=o 

where  kj  =  q2  —  1  for  j  —  1,  . . . ,  6  and  ke+i  :=  £  —  1  when 
*#0. 

Theorem  C(n,  5)  has  minimum  distance  d  >  8  - j-  1.  The  di¬ 
mension  ofC{n,  8)  is  >  n~  8 ~8„tS,  where  8^  s  :=  ]T^2  \6/i\ , 
with  equality  holds  for  8  <  q2 . 

we  shall  call  £+1  the  designed  minimum  distance  of  C(n,  5). 

Example  Consider  the  Hermitian  curve  H( 8)  over  GF( 64). 
Then  G(512, 10)  is  a  (512,  487, 11)  code.  A  one  point  AG  code 
on  H( 8)  of  length  512  and  dimension  487  has  actual  minimum 
distance  7  [6].  Moreover,  from  the  same  curve,  Feng  and  Rao’s 
improved  geometric  Goppa  codes  of  length  512  and  designed 
minimum  distance  11  has  dimension  at  most  484. 
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Abstract  —  In  this  paper  we  propose  a  fast  paral¬ 
lel  decoding  algorithm  for  general  one-point  algebraic 
geometric(l-pt  AG)  codes  with  a  systolic  array  archi- 
tecture(SAA).  This  algorithm  is  able  to  correct  up  to 
half  the  Feng-Rao  bound  and  the  time  complexity  is 
0(n)  by  using  a  series  of  0(n)  processors  where  n  is 
the  code  length  and  each  processor  is  composed  of  r 
cells  for  the  smallest  non-zero  and  non-gap  value  r. 

Our  decoding  algorithm  is  a  parallel  version  of  the  decod¬ 
ing  algorithm  given  in  [5],  which  is  a  special  version  of  multi¬ 
dimensional  Berlekamp-Massey(multi-D  BM)  algorithm.  This 
algorithm  is  implemented  with  a  SAA.  In  [6],  we  recently  pre¬ 
sented  a  parallel  version  of  ID  BM  algorithm  with  a  SAA 
which  can  be  applied  to  decoding  of  Reed-Solomon  codes  and 
BCH  codes.  In  this  paper  we  present  a  scheme  which  is  moti¬ 
vated  by  the  systolic  algorithm[6].  To  implement  the  parallel 
computation,  we  introduce  a  concept  of  a  discrepancy  poly¬ 
nomial  having  discrepancies  as  coefficients  of  its  terms  in  the 
multi-D  BM  algorithm. 

Let  A  be  a  curve  with  genus  g  over  a  finite  field  F. 
Pi, . .  . ,  Pn  and  Q  are  distinct  F-rational  points  on  X.  D 

Pi  -| - h  Pn  and  G  :=  mQ.  T  is  the  set  of  all  non-gap  values 

at  Q  and  r  :=  min{t  £  T\t  ^  0}.  For  each  0  <  i  <  r  —  1, 
Vi  :=  min{*  £  T|t=i(modr)}  and  vT  :=  t.  {vi,...,i>r}  is 
the  minimal  set  of  generators  for  the  semi-group  T  under 
addition.  For  1  <  t  <  r,  let  be  a  function  in  L(ooQ) 
with  -vq(^,)  =  Vi  where  vq(-)  denotes  the  valuation  at  Q . 
v  :=  (vi, . . .  ,vr),  e0  :=  (0, . . . ,  0),  ei  :=  (1,0, . . .  ,0), . ,  er  := 
(0, . . . ,  0, 1)  £  Z \  where  Z+  is  the  set  of  all  non-negative  in¬ 
tegers.  E  :=  Z J  and  E  :=  {e,  4-  keT\k  £  Z+,  0  <  *  <  r  —  1). 
For  any  p  €  E,  •  ••  €  L(ooQ)  and  — vq(^)  = 

ELi  PiV,=:(p-v). 

The  general  1-pt  AG  code  C  of  length  n  over  F  is  defined 
as  follows:  For  c  6  Fn,  c  €  C  iff  ^>=1  CjXp^(Pj)  =  0  for  all 
£  L(mQ),  i.e.,  all  p  £  E  s.t.  (p  ■  u)  <  m.  dpR  denotes 
the  Feng-Rao  designed  distance  defined  in  [3].  Let  (ej)i<j<n 
be  an  error  vector.  For  all  p  £  E,  we  define  the  syndrome  as 
Sf  ■■=  JZ=i  eu<t>p(pu)  where  *  <  [(dFR  -  1)/2J.  All  Sp  are 
known  for  (p‘v)  <  m ,  but  Sp,  m- f  1  <  (p'v),  are  unknown.  To 
correct  up  to  [(dFR  —  l)/2j  errors,  we  must  find  the  values  of 
unknown  syndromes  Sp  s.t.  m  +  1  <  (p-v)  <  N  :=  dFR  +  3j  — 2 
from  [1].  Using  the  majority  scheme[5],  however,  we  can  find 
the  values  of  them. 

For  each  0  <  t  <  r  —  1  and  s,  £  {ei  +  keT\k  £  Z_|_ } ,  we  intro¬ 
duce  the  following  generator  polynomial  and  its  discrepancy 
polynomial.  /(,)( x)  :=  /j^x*  and  ^(x)  :=  x*  e 

F[z]  where  k  £  E  s.t.  (fc-v)  <  (si  *v),  n  £  stH-E  s.t.  (n-v)<N, 
and  d ^  Moreover,  for  the  above  t,  n  and 

0  <  ji  <  r  —  1,  we  consider  the  auxiliary  polynomials 

and  e^(jc)  with  span  Cji  where  n  —  Si  —  Cj{  =  keT ,  \k\  £  Z+. 

1  Email:  kurihara@cs.uec.ac.jp  and  sakata@cs.uec.ac.jp 


We  set  their  initial  data  as  follows:  For  each  0  <  *  <  r  —  1, 
Si  ■■=  Si,  /^(x)  :=  Xr‘  and  <2(,)(x)  :=  J2f,Snxn,  cji  :=  eu  -eT 
and  g^3'\x)  =  e^'\x)  :=  0. 

We  consider  the  following  systolic  array  (see  Fig.  1).  The 
systolic  array  is  composed  of  a  series  of  N  processors  where 
each  processor  is  composed  of  r  cells.  In  each  cell  ID  BM  algo¬ 
rithm  is  practiced  not  per  a  polynomial  but  per  a  term  of  the 
polynomial.  All  processors  receive/send  the  data  from  the  left¬ 
neighboring  processor/to  the  right-neighboring  processor,  syn¬ 
chronously.  We  call  a  unit  of  synchronized  operations  a  beat 
where  each  beat  is  composed  of  a  fixed  small  number  of  arith¬ 
metic  operations  over  F,  which  is  assumed  to  take  O(l)  time 
complexity.  The  number  of  beats  necessary  for  executing  our 
algorithm  is  at  most  3 N.  To  correct  up  to  [(dFR~T)/2j  errors, 
our  algorithm  achieves  an  optimal  0(n)  computing  time  by  us¬ 
ing  a  series  of  0(n)  processors  where  we  assume  O(n)  =  O(A). 
Each  processor  has  O(r)  space  complexity,  and  thus  the  total 
time  and  space  complexity  is  0(rn2).  In  general,  r  <  n,  e.g. 
for  codes  from  Hermitian  curves,  O(r)  =  0(n1^3).  Moreover, 
in  [4],  K otter  proposes  a  parallel  Berlekamp-Massey  type  algo¬ 
rithm  for  Hermitian  codes,  which  time  complexity  is  0(n2)  by 
using  r  processors  where  each  processor  is  composed  of  O(n) 
registers.  Thus,  Kotter’s  total  complexity  is  0(ru3). 
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I.  Introduction 

The  first  criterion  of  self-duality  for  geometric  Goppa  codes 
ha s  been  given  by  Driencourt  and  Michon  [l]  for  codes  con¬ 
structed  from  elliptic  curves.  More  general  criterions  can  be 
found  in  [2,  4,  5,  6,  8].  Our  aim  is  to  effectively  construct  self¬ 
dual  geometric  Goppa  codes.  One  example  is  given  at  the  end 
which  was  done  using  the  implementation  of  the  Brill-Noether 
algorithm  written  in  AXIOM  by  the  author  (see  [3]). 

II.  Self-Dual  Geometric  Goppa  Codes  and 
Class  group 

Denote  by  Wq  the  finite  field  of  q  elements.  For  a  = 
(ai,  a2, . . . ,  an)  £  F”  and  b  =  (&i ,  62, .  . . ,  bn)  £  FJ  we  have  the 
outer  product  a*b  =  (aifci ,  0262, ,  anbn)  £  FJ.  A  linear  code 
C  C  F ”,  n  even,  is  said  quasi  self-dual  \i  there  exists  a  vector 
a  =  (ai ,  «2,  - . . ,  an)  £  FJ,  a;  ^  0,  such  that  a  *  C  —  Cx.  Note 
that  if  for  each  ai  there  exists  bi  such  that  b?  =  ai  then  the 
code  b  *C  is  self-dual  with  b  =  (61 ,  62, . . . ,  bn).  If  charFq  =  2 
then  such  6,  always  exists. 

For  the  rest  of  this  abstract,  F  denotes  an  algebraic  func¬ 
tion  field  in  one  variable  of  genus  g  with  full  constant  field 
F^.  Denote  by  Pf  the  set  of  places  of  F  and  by  VF  the  set 
of  divisors  of  F.  The  class  group  of  F  is  the  factor  group 
Cp  :=  V°f/Vf  where  V°F  is  the  subgroup  of  Vf  consisting  of 
all  divisors  of  degree  zero  and  Vf  the  subgroup  of  principal 
divisors.  The  group  CF  is  finite.  Its  order  h,F  =  h  is  called 
the  class  number  of  F  (see  [7,  V.1.3]).  To  compute  the  class 
number,  one  can  use  the  Zeta-function  of  an  algebraic  function 
field  in  one  variable  (see  [7,  V.1.15  and  V.1.17]). 

Proposition  1  Let  the  divisor  D  :=  Pi  +  P2  +  ■  •  •  -j-  Pn  be 
the  sum  of  n  =  2k  pairwise  distinct  places  of  F  of  degree  1. 
Assume  that  the  class  group  CF  :=  T>°f/Vf  is  cyclic  of  order 
h  ^  0(mod2)  and  have  a  generator  A  with  disjoint  support 
from  that  of  D .  Assume  moreover  that  there  exists  a  divisor 
B  of  degree  1  with  disjoint  support  from  that  of  D .  Then 
there  exists  an  integer  m  £  {0,  1, ...  (ft  —  1)}  such  that  with 
the  divisor 

G  :  —  (ft  +  g  —  1  )5  +  mA 
the  geometric  Goppa  code 

Cc(D,G)  :=  {(f(Pi),f(P»),...,f(P„))€  F"  I  feC(G)}  . 
is  quasi  self-dual . 

Proof:  Since  h  /  0(mod2),  the  divisor  2 A  is  also  a  gen¬ 
erator  of  the  class  group.  Hence  there  exists  an  integer 
mi  £  {0, 1, ...  (h  —  I)}  such  that 

D  —  nB  -j-  2m  1 A  =  2(ft5  +  m  1 A ) . 

For  the  same  reason  there  exists  an  integer  m2  £ 
{0, 1, . . .  (ft  —  1)}  such  that  (2 g  —  2 )B  +  2m. 2A  is  a  canon¬ 
ical  divisor.  If  we  take  m  £  {0, 1, .  .  .  (h  —  1)}  with  m  — 
mi  +  m2  (mod  h)  and  set  G  :=  (ft  +  g  —  1)5  +  mA  we  have 

2G  -  D  =  (2g  -  2 )B  +  2m2  A. 


Hence  2G  —  D  is  a  canonical  divisor  which  implies  that 
Cjc(D,G)  is  quasi  self-dual  (see  [9,  Th.  3.1.46]  or  [6,  Satz 

1]).  □ 

III.  Example 

Let  F  be  the  function  field  of  the  smooth  plane  quartic  X 
defined  by  the  following  equation 

x3z  +  x2v 2  +  xv3  +  +  y4  +  yz3  +  z*  =  0. 

The  genus  of  F  is  g  =  3  and  the  class  number  over  F2  is  h  =  3. 
Over  F2,  F  has  one  place  of  degree  1,  one  place  of  degree 
2  and  7  places  of  degree  4.  Let  P  and  Q  be  respectively 
the  places  of  degree  1  and  2.  The  divisor  A  :=  2 P  —  Q  is 
non-principal,  thus  it  is  a  generator  of  the  class  group  CF. 
The  intersection  divisor  of  the  curve  X  with  any  line  is  a 
canonical  divisor  (see  [9,  Prop.  2.2.7]).  We  take  K  :=  2 P  -\-Q 
as  a  canonical  divisor  which  is  the  intersection  divisor  of  the 
curve  with  the  line  Z  —  0.  Set  the  divisor  B  P.  Then 
4j9-|-2A  is  equivalent  to  K.  Among  the  seven  places  of  degree 
4  and  considered  as  divisors,  two  are  equivalent  to  4 B,  one  is 
equivalent  to  4.B  -f  A,  and  the  remaining  four  are  equivalent 
to  4 B  -f  2 A.  Thus  the  sum  of  the  seven  places  of  degree  4,  say 
D ,  is  equivalent  to  285  +  9A  =  28 B.  Set  G  :=  165  + A.  Then 
2 G  —  D  is  a  canonical  divisor  (see  the  proof  of  Proposition  1). 
Let  54  :=  F24  F.  Let  D ’  :=  ConF4/FD  and  G'  :=  ConF4/F G 
(see  [7,  III. 6. 3  and  V.1.9]).  Then  Cc{D* ,  C?7)  is  a  quasi  self¬ 
dual  [28, 14,  d  >  12]  code  over  F24 .  In  fact  d  =  12  since  by 
computing  the  generator  matrix  of  the  code  we  found  a  word 
of  weight  12. 
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Abstract  —  It  is  shown  that  certain  syndromes  of  a 
Hermitian  code  are  not  needed  for  decoding.  These 
syndromes  can  be  replaced  by  data  symbols  thereby 
increasing  the  dimension  of  the  code  without  changing 
the  designed  minimum  distance. 

I.  Hermitian  Codes  and  Hyperbolic  Codes 

Hermitian  codes  and  hyperbolic  codes  are  defined  on  the  affine 
plane  GF(q )2.  A  hyperbolic  code  is  defined  for  any  q  and  is 
a  two-dimensional  cyclic  code.  A  Hermitian  code  is  defined 
for  q  an  even  power  of  two;  it  can  be  viewed  as  a  shortened 
two-dimensional  cyclic  code.  Whereas  a  hyperbolic  code  is 
defined  on  the  full  affine  plane  GF(q)2,  a  Hermitian  code  is 
defined  on  a  curve  in  the  affine  plane  GF(q)2 .  Using  only  the 
affine  plane  is  so  that  the  discussion  can  be  organized  around 
the  formalism  of  the  two-dimensional  Fourier  transform 

n 

EtV  t'S" 

1=0 

The  Hermitian  polynomial 

g(x,  Y)  =  xm-  ym_1  -  y 

has  n  =  (to  —  l)3—  (m  —  1)  zeros  in  the  affine  plane  over 
GF((2m)2)  of  the  form  (7 ,/3)  with  7  and  0  both  nonzero. 
These  zeros  of  G(X,  F)  are  used  to  define  a  code  over 
GF((2m-1)2)  with  blocklength  n  and  dimension  k  =  mJ—g+ 1 
if  J  >  to,  and  designed  distance  d*  =  n  h  —  g  - fl  where 

»-(-,-)•  .  ,  . 

A  codeword  is  a  vector  c  with  components ^Ci  for  1  — 

0, . . .  ,  n  —  1,  where  i  indexes  the  n  points  (i',i  ^  at  which 

G(cj“*  ,  w_t  )  =  0.  The  spectrum  C  is  required  to  satisfy 

Cji  jit  =■  0  if  j -j-  j  <  J. 

There  are  +  l)(J  +  2)  such  (/, j").  The  code  is  the  set  of 
such  codewords. 

II  An  Enlarged  Code 

We  now  enlarge  the  code  to  a  new  linear  code  that  contains 
the  Hermitian  code.  As  in  the  previous  section,  a  codeword 
is  a  vector  with  components  a  for  i  =  0, . . . ,  n  —  1  where  i 
indexes  the  n  points  (z/,z//)  at  which  G(u»“*  )  =  0.  For 

the  enlarged  code,  the  codewords  satisfy 

Cjiji*  =  0  if  j'+j"  <  J  and  (j'+lH/'+l)  <  d*  =  n-k-g+ 1. 

Otherwise  Cy^//  is  arbitrary.  If  the  set  {(/,/')  |  (j'  4*  tyti"  + 
1)  <  d*}  is  not  contained  in  the  set  {(j\jn)  \  j'  +  j”  <  d}, 
there  will  be  fewer  elements  in  the  intersection  than  in  the 
second  set.  Because  there  are  fewer  such  (}',}")  than  before, 
the  constraints  are  weaker.  Then  there  will  be  more  codewords 
satisfying  the  new  constraint  so  the  dimension  of  the  code  is 
larger. 

Syndrome  Syy/  will  be  known  only  if  j '  +  jn  <  J  and 
(y'  4.  l)(j/;  -f  1)  <  d*.  It  follows  from  the  two-dimensional 
form  of  Massey’s  theorem  that  each  unknown  syndrome  can 
be  inferred  by  a  subsidiary  calculation  in  the  Sakata  algorithm 


just  at  the  time  that  it  is  needed.  This  uses  an  argument  of 
Saints  and  Heegard  in  the  case  that  (j*  +  1  )(j"  +  1)  <  d* , 
and  uses  an  argument  of  Sakata  et  al.  (based  on  the  ideas 
of  Feng  and  Rao)  in  the  case  that  j'  +  j,f  <  J.  Because  the 
unknown  syndromes  that  result  from  the  new  hyperbolic  con¬ 
straint  can  be  inferred  by  the  decoder  there  is  no  reduction 
in  the  designed  distance.  (Apparently  the  performance  of  this 
code  cannot  be  found  by  the  usual  methods  of  algebraic  geom¬ 
etry.)  Feng  and  Rao  showed  that  the  Hermitian  code  has  true 
minimum  distance  larger  than  its  designed  distance.  We  are 
probably  taking  up  the  same  slack  in  another  way,  increasing 
the  dimension  by  reducing  the  true  minimum  distance. 

III.  Syndrome  Filling 

The  Sakata  algorithm  is  a  generalization  of  the  Berlekamp- 
Massey  algorithm  to  two  dimensions,  processing  the  two- 
dimensional  syndromes  in  some  fixed  total  order.  The  graded 
order  works  best  for  our  purposes.  The  locator  polynomial 
update  rule  is  based  on  a  two-dimensional  version  of  Massey’s 
theorem.  At  each  iteration  one  or  more  discrepancies  are  com¬ 
puted  using  the  current  error-locator  ideal.  If  one  or  more  dis¬ 
crepancies  are  nonzero,  then  Massey’s  theorem  describes  how 
the  size  of  the  error-locator  ideal  must  increase. 

It  follows  from  Massey’s  theorem  that  certain  syndromes 
cannot  be  generated  wrong  by  the  Sakata  recursion;  otherwise 
the  error-locator  ideal  would  grow  too  large. 

IV.  Example 

An  example  shows  that  the  class  of  codes  defined  contains 
more  than  the  usual  Hermitian  codes.  We  simply  display  one 
code  in  which  the  set  {(j\  +  1  )(j”  +  1)  <  d*}  is  not 

contained  in  the  set  {(j\jn) \j*  +  j"  <  J }. 

We  choose  the  Hermitian  code  over  GF(256),  so  m  =  17 
and  d *  =  (4080  -  17 J).  Choose  J  =  130,  then  d*  = 
1870.  Then  consider  =  (17,113).  Next,  observe  that 

(17,113)  6  {(j'J”)\(j(  4*  i")  <  130}.  However  (f  +  l)(j"  + 
1)  =  2358.  Therefore 

(17, 113)  l  {(/,}")!(/  +  1  )(i"  +  1)  <  d*}. 

This  means  that  syndrome  Si7,ii3  is  not  needed  by  the  two- 
dimensional  Berlekamp- Massey  algorithm.  Hence  Cn(  113  is 
made  into  a  data  component,  thereby  enlarging  the  code.  In 
particular,  the  enlarged  code  has  larger  dimension  with  the 
same  designed  distance. 
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Abstract  —  Based  on  matrix  completion  algorithms, 
new  constructions  for  algebraic  multilevel  codes  are 
given.  The  constructions  have  low  computational 
complexity  and  can  be  used  for  channels  with  com¬ 
binations  of  burst  and  random  errors. 

I.  Introduction 

Combinations  of  random  and  bursty  errors  usually  occur  in 
many  communication  and  storage  systems.  For  example,  con¬ 
sider  a  conventional  concantenated  coding  system  which  is 
operating  on  a  communication  channel  with  a  combination  of 
random  and  bursty  errors.  The  error  process  at  the  output 
of  the  convolutional  decoder  tends  to  be  bursty,  and  depend¬ 
ing  on  the  channel  output,  the  length  of  the  bursts  varies. 
Assuming  that  interleaving  has  been  used,  the  output  of  the 
de-interleaver  will  produce  errors  of  short  burst  lengths.  In 
an  ideal  situation,  where  the  interleaving  depth  is  enough  to 
remove  all  of  the  bursts,  the  outer  code  may  view  the  error 
process  at  the  output  of  the  de-interleaver  as  a  purely  ran¬ 
dom  error  process.  However,  each  codeword  in  an  interleav¬ 
ing  frame  will  have  a  different  share  in  the  number  of  errors 
produced  by  bursts  of  short  lengths.  Therefore,  in  each  frame, 
the  decoder  for  the  outer  code  may  fail  to  decode  some  blocks 
which  are  suffering  from  a  large  number  of  errors,  while  it  is 
still  capable  of  decoding  the  other  blocks  in  the  frame.  One 
may  benefit  from  re-decoding  the  inner  convolutional  code  as 
a  determinate  state  convolutional  code,  where  the  Viterbi  de¬ 
coder  is  re-initialized  by  known  bits  periodically  [2].  In  this 
way,  the  side  information  provided  by  successfully  decoded 
blocks  can  be  used  to  improve  on  the  error  correction  capa¬ 
bility  of  the  inner  convoutional  code.  The  matrix  completion 
approach  presented  in  this  paper  can  be  used  as  a  complemen- 
tray  rather  than  competing  technique,  since  they  can  be  used 
at  the  same  time. 

In  the  matrix  completion  technique,  syndrome  information 
for  the  algebraic  outer  codes  are  not  provided  explicitly.  At 
the  first  level  of  decoding,  each  frame  is  viewed  as  a  single 
codeword.  However,  at  this  levlel,  no  attempt  will  be  made  to 
find  the  error  locations  and  the  error  values  for  each  block.  In¬ 
stead,  some  syndrome  information  will  be  computed  for  each 
block  in  the  frame.  This  crucial  step  in  the  decoding  process, 
is  achieved  by  a  matrix  completion  algorithm  which  is  sim¬ 
ilar  to  the  Feng-Rao  algorithm  [1],  Now  each  block  may  be 
viewed  as  a  member  of  an  algebraic  code.  For  some  blocks,  the 
computed  syndrome  information  will  be  enough  to  remove  all 
of  the  errors.  For  others,  the  combination  of  the  known  syn¬ 
drome  information  and  the  determinate  state  Viterbi  decoder 
is  used  to  enhance  the  performance  of  the  purely  algebraic 
decoding  algorithm.  The  efficiency  of  this  scheme  comes  from 
the  following  facts: 

•  The  success  of  the  completion  algorithm  for  the  re¬ 
construction  of  the  syndrome  information  depends  on 
the  over  all  number  of  errors  in  the  frame. 

•  Even  if  the  number  of  errors  in  one  block  is  beyond 
the  error  correction  capability  of  the  algebraic  multi¬ 


level  code,  still  one  might  be  able  to  re-construct  the 
syndrome  information. 

•  The  complexity  of  the  completion  algorithm  depends 
mainly  on  the  number  of  the  blocks  and  the  num¬ 
ber  of  the  syndromes  which  are  to  be  re-constructed. 
The  complexity  grows  only  linearly  with  the  size  of  the 
blocks.  Therefore,  the  over  all  complexity  of  the  decod¬ 
ing  is  much  less  than  the  complexity  of  decoding  a  large 
code  of  the  same  length. 

II.  Construction  of  the  Multilevel  Codes 

In  the  multilevel  coding  architecture,  each  frame 
C  =  (Cl,  C2,  •  •  •  ,  Cjv) 

consists  of  N  blocks.  Each  block  ‘cF,  i  —  1, 2, . . .  iV,  is  a  vector 
of  length  n\  over  GF(q).  Assuming  that  ‘c7  is  transmitted  and 
the  word 

r  =  (n,r2)...  ,rjv) 
is  received,  the  error  pattern 

e  =  (ei,  e2, . . .  ,ejv) 

is  defined  by  e*  =  r;  —  c;.  Let  a  be  an  element  of  the  extension 
field  GF(q3)  such  that  the  order  of  F,  the  cyclic  group  gener¬ 
ated  by  a,  is  n\.  The  parameter  s  is  the  smallest  integer  such 
that  a  6  GF(qs).  The  one  dimensional  syndrome  s(ei,7)  for 
any  7  £  T  is  defined  as 

ni  —1 

s(ei,  7)  =  52  e''W3  ■ 

j=  0 

A  linear  multilevel  code  is  defined  to  be  the  vector  space  of 
codewords,  {c},  such  that 

s(c,7)  =  (s(ci,7),s(c2,7),...,s(cjv,7)) 

falls  in  some  specific  subspaces  of  GF(gs)N . 

Here,  we  use  a  matrix  completion  algorithm  to  reconstruct 
s(c,7)  for  some  specific  values  of  7. 
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Abstract —  Parallel  to  the  definition  of  the  rate  dis¬ 
tortion  function  for  source  coding,  we  define  a  rate 
distortion  function  for  delay  in  a  queueing  system 
which  gives  the  tradeoff  between  the  capacity  of  the 
server  and  the  delay  or  buffer  overflow  incurred.  This 
function  is  decreasing  and  convex  and  it  is  shown  to  be 
equal  to  the  “effective  bandwidth”  of  the  input  source 
for  exponentially  vanishing  buffer  overflow  probabil¬ 
ity. 

Recent  work  on  the  information  theoretic  capacity  of  a 
queuefl]  and  on  theory  of  “effective  bandwidth”  [2]  have 
prompted  us  to  take  a  new  look  at  the  capacity- delay 
tradeoff  in  a  single  server  discrete  time  queue  and  view 
this  as  analogous  to  a  rate  distortion  function  for  the 
source.  Conventional  source  coding  theory  would  allow 
perfect  reproduction  of  the  source  at  any  service  rate 
greater  than  the  average  rate  of  the  input  packets;  how¬ 
ever,  this  would  incur  arbitrarily  long  delays.  The  trade¬ 
off  between  the  rate  and  the  delay  is  difficult  to  calculate 
in  general,  but  in  the  case  when  the  distortion  measure 
is  the  buffer  overflow  probability,  then  we  show  that  the 
limiting  value  of  the  rate  distortion  function  is  the  effec¬ 
tive  bandwidth  of  the  source. 

We  consider  a  discrete  time  slotted  queue,  where  the 
input  and  the  output  processes  consist  of  a  sequence  of 
packets  (e.g.  ATM  packets).  Consider  two  (discrete¬ 
time)  point  processes  a;  =  =  1,  2, . . i  =  1  and 

2.  Let  A*(0,t),  i  —  1  and  2,  be  the  number  of  arrivals  in 
the  interval  (0,£]  of  the  point  process  a*  and  T;(n),  i  =  1 
and  2,  be  the  arrival  epoch  of  the  nth  customer  of  the 
point  process  a;.  We  define  a  class  of  distance  functions 
between  two  point  processes  as  follows: 

1  * 

pq(a1,a2)  =  lim  sup  -  S']  Ef(A1(0,  t),  A2( 0,  t ))  (1) 

t“>0°  1  .=i 

for  some  function  /  :  1Z2  1Z.  A  similar  distortion  mea¬ 

sure  can  be  defined  using  arrival  instants.  For  instance, 
if  (Zi  is  an  arrival  process  to  a  queue  and  02  is  the  corre¬ 
sponding  departure  process,  then  pq  (01,02)  is  the  aver¬ 
age  queue  length  and  pd(axy  a2)  is  the  average  delay  when 
f\zlyz2)  =  z2  -  zx. 

We  consider  a  queue  with  time  varying  capacity  c(2)[3] 
where  c(t)  is  the  maximum  number  of  packets  served  in 
time  slot  t.  Then  the  behaviour  of  the  queue  is  governed 
by  the  following  recursive  equation:  q{t-\~  1)  =  ( q(t)-\-a(t-\- 
1)  —  c(t))+.  We  will  call  the  sequence  {c(t)yt  >  0}  an  in¬ 
dependent  bandwidth  allocation  sequence  if  {c(f),t  >  0} 
is  independent  of  the  arrival  process.  Let  Tc  be  the  fam¬ 
ily  of  independent  bandwidth  allocation  sequences  that  is 


stationary  and  ergodic  with  mean  c.  Also,  let  a  ( b )  denote 
the  arrival  (departure)  process.  For  the  class  of  distance 
functions  pq(ay  b)y  we  say  that  a  distortion  D  is  achiev¬ 
able  at  rate  c  if  there  is  an  independent  bandwidth  allo¬ 
cation  sequence  {c(t),  t  >  1}  E  Tc  such  that  pq(ay  b )  <  D. 
The  rate  distortion  function  R9(D)  is  defined  to  be  the 
minimum  rate  c  that  the  distortion  D  is  achieved,  i.e., 

R*(D)=  inf{c  :  p«(o,  b)  <  D}.  We  define  Rd(D)  similarly. 
One  can  show  that  the  rate  distortion  functions  Rq(D) 
and  Rd(D)  are  both  decreasing  and  convex  in  D. 

In  general,  the  calculation  of  Rq(D)  and  Rd(D)  are 
difficult  problems.  However,  motivated  by  the  case  of 
ATM  networks  with  exponentially  small  buffer  overflow 
probability,  we  consider  the  distance  function 

A  1  * 

Pi  (a>  b)=  lim  sup  -  V  ®l{ii(ol*)-B(o,t)>*}i  (2) 

t“*00  5=1 

and  let  R%{D)  denote  the  corresponding  rate  distortion 
function  Note  that  5(f)  =  A(0,  t)  —  5(0,  t)  when  q(0)  =  0. 
If  a(t)  is  stationary  and  ergodic  with  mean  Ea(t)  <  c, 
then  p%{ay  b)  =  Pr(g(oo)  >  x).  In  this  case,  we  have 
the  following  asymptotic  result,  based  on  the  theory  of 
effective  band  width  [3]: 

Theorem  1  If  the  arrival  process  a(t)  is  stationary  and 
ergodic  and  it  satisfies  limt-*oo  j  log  Ee6A(0,t)  =  A(0),  for 
all  0  >  0,  then 

lim  Ri(e-e*)  =  a*(e),  (3) 

X — ►  oo 

where  a*(0)  is  called  the  effective  bandwidth  of  the  source , 
and  is  defined  to  be  a*(Q)  —  ^p- 

The  concept  of  effective  bandwidth  has  attracted  consid¬ 
erable  interest  lately,  and  a  number  of  papers  develop 
a  calculus  of  effective  bandwidth  that  allows  one  to  an¬ 
alyze  superpositions  of  sources,  outputs  of  queues,  and 
networks  of  queues.  The  current  work  provides  an  new 
interpretation  of  some  of  these  results,  and  provides  a  link 
between  rate  distortion  theory  and  queueing  theory. 
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Abstract  -  It  is  shown  that  if  a  queueing  network  is 
stable  with  fluid  arrival  processes,  then  it  is  also  stable 
for  deterministically  constrained  bursty  arrival  processes 
of  the  same  or  smaller  long-term  rate. 

SUMMARY 

Fluid  models  of  queuing  networks  are  among  the  simplest 
models  to  analyze,  owing  to  the  fact  that  calculus  can  be  applied. 
At  the  same  time,  wider  classes  of  network  models  are  more  flex¬ 
ible  for  modeling  real  traffic.  It  is  thus  useful  to  reduce  questions 
about  the  more  realistic  models  to  questions  about  related  fluid 
models.  Such  a  reduction  was  recently  achieved  by  J.G.  Dai,  who 
showed  that  stability  of  a  fluid  model  implies  stability  (in  the 
sense  of  Harris  recurrence)  of  related  multiclass  networks  with 
random  service  and  interarrival  processes  of  renewal  type.  The 
purpose  here  is  to  similarly  reduce  the  question  of  stability  for 
networks  with  input  traffic  satisfying  deterministic  constraints  in 
the  sense  of  Cruz  ( IEEE  IT  Tranasactions,  January  1991)  to  a 
question  of  stability  for  a  fluid  model. 

The  network  has  d  single  server  stations  and  K  classes  of 
traffic.  Class  l  traffic  is  served  at  a  unique  station  s(l).  Let  C  be 
the  d  x  K  matrix  such  that  Citi  —  1  if  s(l)  =  i  and  Ci,i  =  0 
otherwise.  Upon  completion  of  service  at  s(l)  the  traffic  of  class  l 
either  becomes  traffic  of  class  V  for  some  other  class  in  which 
case  we  write  l  ->  1',  or  it  immediately  exits  the  network.  Let  P 
denote  the  K  x  K  matrix  such  that  =  1  if  /  — y  V  and  ^  =  0 
otherwise.  A  simple  network  with  three  stations  and  eight  classes 
is  shown  in  Figure  1  with  1^2— >3,  4— >-5->6  and  7  — >•  8  and 
s  =  (1,  2,  3,  2,  3, 1, 3,  2).  It  is  assumed  that  the  network  is  open, 
so  that  PK  is  the  zero  matrix. 


Figure  1:  Sample  network. 


Exogenous  traffic  can  enter  the  network  as  any  class,  though 
for  the  example  given  it  might  make  sense  for  the  exogenous 
arrival  functions  to  be  nonzero  only  for  classes  1,  4  and  7.  Let 
Ei(t)  denote  the  amount  of  exogenous  class  l  traffic  to  enter  the 
network  during  [0,  t].  Traffic  of  class  l  can  be  served  (at  station 
s(k))  at  a  maximum  rate  pi  —  1  /mi  where  mi  >  0.  Let  M  = 
diag(m\, . . . ,  mi<-)  and  let  e  denote  a  column  vector  of  all  ones 
(with  dimension  depending  on  the  context).  The  flow  of  traffic 
in  the  network  is  assumed  to  satisfy  the  following  equations  and 
conditions: 


Q(t) 

=  q+E(t)  +  (PT-I)M~1T(t) 

(i) 

m 

=  et-  CT(t) 

(2) 

Q(t )  >0  for  t  >  0 

(3) 

J  (CQ(t)  A  e)  dl(t)  =  0 

(4) 

T(  0)  =  0, 

T  is  right-continuous  and  nondecreasing 

(5) 

1(0)  =  0, 

I  is  right-continuous  and  nondecreasing. 

(6) 

The  the  following  interpretations  hold: 

Qi(t)  is  the  amount  of  class  l  traffic  in  the  network  at  time  t, 
and  Qi  (0)  =  q0 . 

Ti(t)  is  the  amount  of  work  (where  work  is  measured  in  units  of 
time)  done  on  class  l  traffic  during  [0,  t]. 

( CT(t)){  is  the  amount  of  work  done  at  station  i  during  [0,  £]. 

/»(£)  is  the  amount  of  idleness  (measured  in  units  of  time)  of  the 
server  at  station  *  accumulated  during  [0,  t] 

(CQ(t))i  is  the  amount  of  traffic  at  station  t  at  time  £. 

The  exogeneous  traffic  E  is  said  to  satisfy  deterministic  con¬ 
straints  with  parameters  a  =  (a*)  and  <r  =  (<tj),  abbreviated  to 
UE  is  DC(a,j)  traffic”,  if 

0  <  Ei(t)  —  Ei(s)  <  ai(t  s)  -}-  0  <  t  <  s  <  oo.  (7) 

The  network  (C,  P,  m)  is  totally  stable  for  DC  (a,  <r)  traffic  if  there 
is  a  finite  constant  T  so  that  whenever  (P,  q}  Q,  T,  I)  satisfy  (1)  — 
(7),  then  limsup,.^  |Q(t)|  <  T,  where  |Q(t)|  =  |Qi(f)|. 

Theorem  1  If  the  network  (C,  P,  m)  is  totally  stable  for 
fluid  traffic  with  input  rate  vector  a,  then  it  is  totally  stable  for 
DC(a,<r)  traffic,  for  any  vector  <r. 
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Abstract  -  We  extend  the  threshold  condition 
for  optimality  of  the  policy  for  activating  the  slow 
server  in  a  2-server  queueing  system  when  the  service 
times  are  deterministic. 

I.  Introduction 

We  consider  a  queueing  system  composed  of  an 
infinite-size  buffer  and  two  servers  Si  and  S2  with  con¬ 
stant,  but  different,  service  times  Ti  and  T2,  respec¬ 
tively,  with  T2  >  Ti.  The  arrival  process  is  Poisson 
with  rate  A  <  +  ^r.  We  wish  to  find  the  optimal 

policy  for  server  activation  that  minimizes  the  average 
customer  sojourn  time  in  the  system. 

II.  Background 

This  is  a  problem  that  has  been  well-studied  for 
the  case  of  exponential  servers  [1-3].  The  optimal 
policy  has  been  shown  to  consist  of  always  keeping  the 
fast  server  busy,  as  long  as  the  queue  is  non-empty, 
and  of  activating  the  slow  server  when  the  queue  size 
is  greater  than  a  threshold  ra0,  the  value  of  which 
depends  on  A  and  on  the  service  rates.  Extension  of 
the  result  in  more  complicated  systems  has  been,  in 
general,  difficult.  The  motivation  for  considering  the 
deterministic  case  is  that  in  packet-switched  systems 
the  transmission  time  of  each  packet  is  constant  so 
long  as  the  transmission  bandwidth  remains  constant. 

III.  Approach 

Let  xa(t)  denote  the  queue  size  at  time  t  and  ri{t) 
the  residual  service  time  of  server  Si,i  =  1, 2;  clearly, 
0  <  ri(t)  <  Ti.  The  vector  x(t)  =  (x0(Q,ri(£),r2(£)) 
is  a  Markovian  state  description  of  the  system.  We 
let  Xi(t)  be  0,  if  ri(t)  =  0,  and  1  otherwise.  The 
total  number  of  customers  is  then  given  by  |a;(i)|  = 
xQ{t)  +xi(t)  +  x2{t).  Let  7r  be  a  control  policy  that 
decides  at  every  t  >  0  which  idle  server  to  activate 
based  on  (x(s),  0  <  s  <  t}.  The  policy  7r  is  optimal  if 
it  minimizes  the  long-run  average  cost  Jn(x ),  where 
Jn(x)  =  limsup^I?£[/0T  \x(t)\dt]  where  x  is  the  ini¬ 
tial  state. 


Markov  Decision  Theory  cannot  be  used  easily 
to  establish  optimality  conditions  here  because  of  the 
continuity  of  the  transitions  in  the  residual  service 
times.  However,  we  use  the  special  features  of  the 
deterministic  service  times  to  obtain  necessary  condi¬ 
tions  for  the  optimal  Markovian  policy  that  coincide 
with  the  properties  of  the  optimal  policy  of  the  expo¬ 
nential  service  case. 

Furthermore,  we  obtain  lower  and  upper  bounds 
to  the  threshold  value  for  the  class  of  threshold  poli¬ 
cies. 

IV.  Results 

The  optimal  policy  is  shown  to  (i)  activate  at 
least  one  of  the  servers  without  delay  if  they  are  both 
idle,  (ii)  activate  the  fast  server  immediately  if  it  is 
idle  and  the  slow  server  is  busy,  and  (iii)  activate  the 
fast  server  before  the  slow  server  if  both  are  idle. 
Furthermore,  the  optimal  Markov  policy  that  acti¬ 
vates  the  slow  server  when  the  system  is  in  states 
(y, ri, 0),  (z,r i,0)  for  y  <  z,  must  also  activate  the 
slow  server  for  any  state  (x,ri,0),  for  y  <  x  <  z. 

The  lower  bound  to  the  threshold  value  is  given 
by  1  —  -jf  +  —  1)(1  —  ATi),  and  the  upper 

bound  by  1  +  .  The  method  for  computing  the 

thresholds  applies  to  the  exponential  case  as  well  and 
the  results  are  consistent  with  the  exact  threshold  cal¬ 
culations  in  [3]. 
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Abstract  —  This  work  considers  packet  radio  net¬ 
works  in  which  users  transmit  using  spread-spectrum 
multiple-access  (SSMA)  signaling,  slotted  ALOHA 
random  access,  and  forward-error-correction  (FEC). 
It  is  well  known  that  these  networks  can  exhibit 
bistable  behavior  similar  to  narrowband  ALOHA  sys¬ 
tems.  In  this  work,  we  analyze  the  impact  of  FEC  pa¬ 
rameters  on  the  throughput,  delay,  and  drift  of  slow 
frequency-hop  (FH)/SSMA  networks.  We  present 
exact  expressions  for  throughput,  delay,  and  drift, 
and,  furthermore,  characterize  bistable  systems  by 
their  first  exit  times  (FET).  Drift  analysis  suggests 
an  approach  to  eliminating  bistability  that  involves 
increasing  both  the  average  retransmission  time  and 
the  blocklength.  Numerical  examples  are  provided 
to  illustrate  our  approach.  A  noteworthy  feature  of 
our  approach  is  that  the  elimination  of  bistability  is 
achieved  by  careful  selection  of  system  parameters 
without  using  active  control. 

I.  Introduction 

Consider  a  network  in  which  a  population  of  N  transmitters 
(or  users)  and  N  receivers  share  a  common  radio  channel. 
The  network  topology  is  assumed  to  be  fully- connected  with 
paired-off  transmissions;  each  transmitter  communicates  with 
a  single,  unique  receiver.  Each  user  is  fed  by  a  bursty  mes¬ 
sage  source.  Users  transmit  messages  in  the  form  of  pack¬ 
ets  using  spread-spectrum  multiple-access  (SSMA)  signaling, 
slotted  ALOHA  random  access,  and  forward-error-correction 
(FEC).  We  assume  that  feedback  is  present  and  that  the 
feedback  propagation  delay  is  negligible  in  comparison  to  the 
packet  transmission  time. 

It  is  well  known  that  such  networks  can  exhibit  bistable  be¬ 
havior  similar  to  narrowband  ALOHA  systems  [1].  A  bistable 
system  possesses  two  locally  stable  equilibria  with  the  system 
achieving  high  throughput  and  small  delay  at  one  (operating 
point)  and  low  throughput  and  large  delay  at  the  other  ( sat¬ 
uration  point).  In  practice,  a  bistable  network  can  remain  in 
saturation  for  large  time  periods,  thus  leading  to  poor  perfor¬ 
mance.  Consequently,  the  elimination  of  bistability,  by  which 
we  mean  removing  the  saturation  point  while  retaining  the 
throughput-delay  performance  at  the  operating  point,  is  of 
importance  in  practical  networks. 

Various  stabilization  techniques,  which  aim  to  prevent  the 
network  from  reaching  saturation,  have  been  examined  in  the 
literature.  Notably,  these  techniques  rely  either  on  recursive 
retransmission  control  (e.g.,  [2]),  wherein  an  estimate  of  the 
total  number  of  backlogged  users  in  the  network  is  employed 
by  each  user  to  alter  a  design  parameter,  such  as  the  retrans¬ 
mission  probability,  or  code  rate,  or  else  on  centralized  control 
of  user  transmissions  (e.g.,  [3]). 

^This  work  was  supported  by  the  National  Science  Foundation 
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Our  focus,  in  this  work,  is  on  investigating  the  impact  of 
FEC  code  parameters  on  the  throughput,  delay,  and  drift 
of  SSMA  networks.  For  concreteness,  we  consider  slow 
frequency-hop  (FH)/SSMA  signaling  with  Reed-Solomon  era¬ 
sure  correction  [4].  We  present  a  model  which  is  exact  for 
finite  user  populations  and  expressions  for  throughput,  delay, 
and  drift.  Moreover,  we  characterize  bistable  systems  by  their 
first  exit  tim.es  (FET),  which  is  a  measure  of  the  average  time 
taken  by  a  bistable  network  to  reach  saturation,  starting  from 
zero  user  backlog.  Also,  a  simpler  limiting  model  is  presented 
which  may  be  used  when  both  the  number  of  users  and  the 
number  of  frequency  bins  are  large.  We  then  present  our  ap¬ 
proach  to  eliminating  bistability  based  on  a  drift  dialysis  of 
the  limiting  model.  Finally,  numerical  examples  are  provided 
to  illustrate  our  approach. 

II.  Conclusions 

The  following  four  observations  apply  to  a  bistable 
FH/SSMA  network: 

(1)  Increasing  the  code  blocklength  leads  to  higher  through¬ 
put  and  lower  delay  at  the  operating  point  at  the  cost  of 
smaller  FET. 

(2)  Increasing  the  average  time  to  retransmission  leads  to 
larger  FET  at  the  cost  of  lower  throughput  and  higher  delay 
at  the  operating  point. 

(3)  Elimination  of  bistability  necessitates  increasing  both 
the  blocklength  and  the  average  time  to  retransmission. 

(4)  At  fixed  blocklength,  further  improvement  in  operating 
point  performance  can  be  achieved  by  optimizing  over  code 
rate. 

From  (3),  we  infer  that  it  is  possible  to  eliminate  bistability 
by  careful  selection  of  network  design  parameters  without  the 
use  of  active  (decentralized  or  centralized)  control. 
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Summary 

In  our  earlier  work  [1,  2]  we  applied  the  diffusion 
approximation  method  to  the  steady  state  analysis  of 
an  ALOHA  random  access  protocol,  and  non-Markovian 
queueing  networks.  More  recently,  we  have  successfully 
applied  the  diffusion  model  in  analyzing  the  transient 
behavior  of  a  statistical  multiplexer,  and  in  deriving  a 
simple  formula  for  the  effective  bandwidth  of  a  bursty 
traffic  source  in  an  ATM  (asynchronous  transfer  mode) 
network  [3], 

In  this  paper,  we  present  a  transient  analysis  of  media 
access  control  (MAC)  protocols  such  as  slotted  ALOHA 
and  CSMA/CD  (Ethernet)  by  formulating  their  queue 
behavior  as  an  Ornstein-Uhlenbeck  process  X(t).  We 
also  derive  an  important  result  on  the  transient  mean 
mx(t)  =  E[X(t )]:  if  the  drift  coefficient  /3(xyt)  is  a  lin¬ 
ear  function  of  the  system  congestion  x  =  X(t),  then 
mx(t)  that  we  can  obtain  under  the  assumption  of  ho¬ 
mogeneous  diffusion  coefficient  is  an  unbiased  estimate 
of  mjv(U>  tlie  mean  of  the  original  process  N(t),  and  is 
independent  of  the  diffusion  coefficient  a(x,t). 


N(t)  =  n  users 
in 

"backlogged"  state 


Figure  1:  The  number  of  backlogged  users,  N(t ),  in  the 
slotted  ALOHA  system 

Figure  1  is  a  queueing  model  representation  of  the 
slotted  ALOHA  system  being  considered.  There  are  K 


users  in  the  system.  N(t)  is  the  number  of  the  “back- 
logged”  users  who  are  either  engaged  in  actual  trans¬ 
mission  or  waiting  for  a  retransmission  at  the  t- th  slot 
time.  The  parameter  r  is  the  probability  that  a  re¬ 
transmission  will  take  place  at  a  given  time  slot.  The 
remaining  K  -  N(t)  users  are  in  the  “user  response” 
state,  ready  to  generate  a  new  packet  with  probability 
v  in  a  given  slot. 

We  approximate  the  integer- valued  process  N(t)  by 
a  continuous-state  diffusion  process  X{t),  which  has  the 
drift  coefficient  (3(x,t)  =  (I< -x(t))v-S(x(t))y  and  the 
diffusion  coefficient  a(x,t)  =  (K  —  x(t))v  +  S(x(t))y 
where  S(x(t))  represents  the  channel  throughput  when 
the  process  X(tf)  is  in  state  x  at  time  t. 

For  analytical  tractability,  we  simplify  f3(x,t)  and 
a(x,t)  as  follows:  Suppose  that  S(x)  (hence  /3(x,t),  as 
well)  can  be  approximated  by  a  linear  function  of  x(t ), 
i.e.,  f3(x,t)  -  /?o -A  x(t)]  and  that  a{x,t)  is  nearly  con¬ 
stant  around  some  value  x,  i.e.,  a(x,t)  &  ocq.  Then  the 
diffusion  process  A(£)  becomes  an  Ornstein-Uhlenbeck 
process,  and  we  can  readily  obtain  the  time-dependent 
solution  f(x,t\x0):  it  is  a  Gauss-Markov  process  with 
time-dependent  mean  mx(t)  =  x(l  —  e~'6lt)  +  xoe~^3lt 
and  variance  cr\(t)  =  ^-(1  —  e~23lt). 
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Abstract  —  In  this  paper  we  extend  further  the  re¬ 
sults  of  [1]  and  [2]  to  CDMA  networks  with  truly 
multi-rate  multi-media  traffic.  Voice,  high  and  low 
priority  data,  and  possibly  videophone  traffic  all 
use  DS/CDMA  modulation  but  different  information 
(data)  rates. 

I.  Introduction 

In  our  recent  work  of  [1]  we  introduced  a  Markovian  formu¬ 
lation  for  the  problem  of  optimal  admission  of  voice  users  in 
a  CDMA  network;  there  was  also  data  traffic  in  the  CDMA 
network  but  it  had  lower  priority  over  voice  and  was  allowed 
to  use  the  CDMA  codes  left  unused  by  the  voice  users.  In 
[2]  we  extended  this  work  to  CDMA  networks  with  voice  and 
multi- priority  data;  there  are  two  classes  of  data  users,  those 
with  high  priority  (same  as  voice)  that  require  real-time  de¬ 
livery  (but  lower  BER  than  voice)  and  those  with  low  priority 
that  are  treated  as  in  [1].  In  the  work  of  both  [1]  and  [2] 
voice  and  data  users  transmit  at  the  same  data  (information) 
rate  and  employ  the  same  processing  gain  (number  of  chips 
per  bit)  in  their  DS/CDMA  modulation.  Finally,  in  our  work 
of  [3]  we  provided  a  preliminary  analysis  of  a  CDMA  system 
supporting  true  multi-rate  multi-media  traffic.  In  [3]  multiple 
chip  rates  are  used  to  spread  the  user  signals  in  proportion  to 
their  information  data  rates  rather  than  spreading  them  over 
the  entire  frequency  band.  In  this  paper  we  extend  further 
the  results  of  [1]  and  [2]  to  CDMA  networks  with  truly  multi¬ 
rate  multi-media  traffic.  Voice,  high  and  low  priority  data, 
and  possibly  videophone  traffic  all  use  DS/CDMA  modula¬ 
tion  but  different  information  (data)  rates.  Since  the  signals 
of  all  users  are  spread  over  the  entire  bandwidth,  the  different 
data  rates  result  in  different  processing  gains. 

II.  Multi-Rate  CDMA  System 

The  different  processing  gains  affect  the  other- user  interfer¬ 
ence  in  a  more  complicated  way  that  in  traditional  CDMA 
interference  evaluation.  However,  both  an  approximate  analy¬ 
sis  based  on  the  Gaussian  approximation  and  another  analysis 
with  any  desirable  accuracy  based  on  the  characteristic  func¬ 
tion  method  are  carried  out  in  order  to  determine  the  BER  of 
the  various  traffic  types  as  functions  of  the  information  rates, 
the  overall  bandwidth,  and  the  number  of  users  in  each  traffic 
class.  This  evaluation  is  used  to  determine  the  capacity  region, 
that  is,  the  maximum  number  of  users  that  can  be  supported 
from  each  class  so  that  the  individual  class  BER  requirements 
are  met.  This  calculation  is  carried  out  for  (i)  voice  and  data 
of  different  information  rates  and  BERs,  (ii)  voice  and  two 
types  of  data,  all  with  different  information  rates  and  BERs, 
and  (iii)  videophone,  voice,  and  data  with  different  rates  and 
BERs. 

III.  Code  Allocation  for  Multi-Media  Traffic 

Three  types  of  policies  for  optimal  CDMA  code  allocation 
are  derived:  one  that  pertains  to  voice  and  low  priority  data, 


another  that  pertains  to  voice,  high  priority  data,  and  low 
priority  data,  and  a  third  one  that  pertains  to  videophone, 
voice,  and  low  priority  data. 

For  the  first  policy  we  present  an  optimal  allocation  scheme 
that  determines  the  number  of  newly  arrived  voice  calls  that 
are  accepted  in  the  network  so  that  the  long-term  blocking 
(rejection)  rate  of  voice  calls  is  minimized  and  the  packet  error 
probabilities  of  voice  traffic  remains  within  acceptable  limits. 
The  unused  CDMA  capacity  is  be  used  by  data  traffic  and  the 
reamining  data  traffic  is  queued.  We  consider  two  models  for 
the  effects  of  other- user  interference:  the  threshold  model  and 
the  graceful  degradation  model. 

For  the  second  policy  we  derive  an  optimal  code  allocation 
scheme  that  determines  the  number  of  newly  arrived  voice 
calls  and  data  users  with  high  priority  that  are  accepted  in 
the  network  so  that  the  long-term  weighted  blocking  rates  of 
voice  calls  and  data  traffic  is  minimized  and  the  packet  error 
probabilities  of  these  two  traffic  types  are  within  acceptable 
limits.  For  the  lower  priority  data  we  consider  two  policies. 
According  to  the  first  policy  there  are  no  CDMA  codes  re¬ 
served  for  these  data,  they  get  assigned  CDMA  codes  only 
when  the  combined  voice  and  high  priority  data  traffic  leaves 
certain  codes  unused.  The  second  policy  operates  like  the  first 
except  that  there  is  also  a  small  number  of  CDMA  codes  that 
are  always  assigned  to  low  priority  traffic.  For  both  schemes 
the  BER  requirement  for  the  low  priority  data  traffic  is  met. 

The  performance  measures  are  the  average  blocking  rates 
and  average  throughputs  of  the  voice  calls  and  all  data  mes¬ 
sages  as  functions  of  the  offered  voice  and  data  traffic  loads 
under  the  proposed  optimal  code  allocation  policy.  The  queue¬ 
ing  delay  and  the  packet  loss  probability  of  the  low  priority 
data  traffic  is  also  evaluated.  A  semi- Markov  decision  process 
(SMDP)  with  guaranteed  BERs  for  voice  and  data  traffic  is 
used  for  formulating  the  system  operation  as  a  dynamic  code 
assignment  problem.  A  value-iteration  algorithm  is  applied  to 
this  SMDP  to  derive  the  optimal  policy. 

The  third  policy  has  many  similarities  in  its  operation  with 
the  second  policy,  but  it  operates  on  videophone  traffic  instead 
of  high  priority  data  traffic;  this  fact  gets  reflected  in  the  multi¬ 
state  Markovian  model  used  (more  complicated  than  the  one 
used  for  data)  and  the  different  BER  requirements. 
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Abstract  —  We  consider  a  slotted  ring  that  allows 
simultaneous  transmissions  of  messages  by  different 
users,  known  as  ring  with  spatial  reuse .  To  alleviate 
fairness  problems  that  arise  in  such  networks,  policies 
have  been  proposed  that  operate  in  cycles  and  guar¬ 
antee  that  certain  number  of  packets,  called  quota ,  will 
be  transmitted  by  every  node  in  every  cycle.  We  pro¬ 
vide  sufficient  and  necessary  stability  conditions  for  such 
rings. 

I.  Introduction 

We  consider  a  ring  with  spatial  reuse,  i.e.,  a  ring  in  which 
multiple  simultaneous  transmissions  are  allowed  as  long  as 
they  take  place  over  different  links  (cf.  [1,  2,  3]).  Time  is 
divided  in  slots  and  each  slot  is  equal  to  the  smallest  trans¬ 
mission  unit,  called  packet.  A  node  receiving  a  packet  with 
destination  another  node  in  the  ring  (ring  packet),  may  re¬ 
transmit  the  packet  in  the  outgoing  link  in  the  same  slot. 

While  rings  with  spatial  reuse  have  higher  throughput  than 
standard  token  passing  rings,  they  also  introduce  the  possibil¬ 
ity  that  some  overloaded  nodes  may  block  other  nodes  from 
accessing  the  ring.  To  avoid  this  problem,  the  following  policy 
is  proposed  in  [1,  2]  for  the  operation  of  the  ring.  Each  node 
is  assigned  a  number  called  “quota”.  The  policy  operates  in 
cycles.  A  node  is  allowed  to  transmit  during  a  cycle  as  long 
as  the  number  of  transmitted  packets  does  not  exceed  its  as¬ 
signed  quota.  A  cycle  ends  when  the  quotas  of  all  nodes  are 
delivered  to  their  destinations.  In  this  way,  the  operation  of 
a  node  with  regular  traffic  requirements  is  not  adversely  af¬ 
fected  by  nodes  that  may  become  overloaded.  An  analysis  of 
the  throughput  characteristics  of  this  policy  is  presented  in  [3], 
where  it  is  also  shown  that  if  the  end-to-end  throughput  re¬ 
quirements  result  in  aggregate  traffic  load  for  each  link  of  the 
network  less  than  one,  then  the  node  quotas  can  be  selected 
to  achieve  these  throughput  requirements. 

II.  Main  Results 

The  primary  goal  of  this  work  is  to  obtain  the  stability 
region  of  the  ring  network  with  finite  quota  and  to  compare 
it  with  the  maximum  achievable  stability  region  for  such  ring 
networks  (cf.  [3]).  The  second  motivation  is  to  extend  our  sta¬ 
bility  approach  of  multidimensional  distributed  systems  devel¬ 
oped  in  Georgiadis  and  Szpankowski  [4,  5]  and  Szpankowski  [6] 
to  ring  networks  with  spatial  reuse.  The  conditions  for  stabil¬ 
ity  are  derived  by  means  of  a  technique  that  is  based  on  an  ap¬ 
plication  of  mathematical  induction,  stochastic  monotonicity 
properties  and  Loynes  stability  criteria.  A  special  technique, 
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2  Research  supported  in  part  by  NSF  under  grant  NCR-9211417. 


based  on  the  structure  of  the  complement  of  the  stability  re¬ 
gion  and  the  construction  of  a  dominant  system,  permits  the 
derivation  of  the  necessary  stability  conditions  from  the  in¬ 
stability  condition  of  a  dominant  system.  The  general  steps  of 
the  above  stability  analysis  have  been  applied  to  the  analysis 
of  other  systems  as  well  (cf.  [4,  5,  6]).  It  should  be  stressed 
that  this  general  construction  of  [4,  6]  requires  detailed  and 
subtle  modifications  for  almost  every  queueing  network  which 
may  be  far  from  trivial,  and  this  analysis  is  a  typical  example. 
In  addition-  we  provide  a  decomposition  and  characterization 
of  the  instability  region  of  the  system. 

The  exact  computation  of  the  stability  region  depends  on 
the  distribution  of  the  arrival  processes  and  this  often  ren¬ 
ders  this  computation  intractable.  The  dependence  on  the 
distribution  leads  us  to  the  introduction  of  the  notions  of  the 
essential  and  absolute  stability  region.  The  first  contains  any 
arrival  rate  vector  such  that  for  every  distribution  with  this 
arrival  rate  vector  the  network  is  stable.  The  second  contains 
any  arrival  rate  vector  for  which  there  exists  some  distribu¬ 
tion  with  this  arrival  rate  vector  under  which  the  network  is 
stable.  Both  stability  regions  have  interesting  practical  im¬ 
plications.  If  the  arrival  distribution  is  not  known,  then  the 
essential  stability  region  is  essentially  the  operational  region 
of  the  system.  The  absolute  stability  region  specifies  what  is 
achievable  when  the  arrival  streams  can  be  shaped  to  have 
the  statistics  which  lead  to  higher  throughput.  We  present 
a  method  based  on  linear  programming  that  permits  the  de¬ 
velopment  of  upper  and  lower  bounds  on  the  stability  region 
using  only  the  knowledge  of  the  average  cycle  lengths.  For 
the  case  of  two  nodes  we  provide  a  closed-form  expression  for 
the  region  of  arrival  rates  where  the  system  is  stable  for  any 
arrival  distribution. 
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Abstract  —  This  paper  presents  the  results  of  a  ca¬ 
pacity  evaluation  for  a  cellular  code-division,  multi¬ 
ple  access  (CDMA)  system  over  a  wide-band  fading 
channel.  The  study  compares  the  performance  of  a 
system  based  on  a  conventional  matched  filter  with 
that  based  on  an  adaptive  receiver.  Trellis  codes  and 
various  rate  convolutional  codes  are  investigated  in  an 
attempt  to  improve  the  system  performance.  Perfor¬ 
mance  is  measured  in  terms  of  the  maximum  number 
of  simultaneous  users  per  cell  for  a  given  bit  error  rate 
(BER).  The  channel  model  is  developed  from  mea¬ 
sured  propagation  data  at  2.6  GHz  in  heavily  built-up 
urban  areas  and  includes  the  effect  of  both  intra-cell 
and  inter-cell  interferers. 

I.  Summary 

One  of  the  major  problems  associated  with  using  direct  se¬ 
quence  CDMA  systems  is  the  low  channel  efficiency.  For  single 
cell  systems  based  on  conventional,  matched  filter  receivers, 
the  efficiency  is  typically  between  10  -  20%  [1],  [2]  of  the  theo¬ 
retical  channel  capacity.  By  using  an  adaptive  linear  receiver 
[l],  [3],  it  is  possible  to  increase  the  system  efficiency  of  a 
single  cell  system  significantly  with  only  a  moderate  increase 
in  receiver  complexity.  This  is  also  the  case  for  a  multi-cell 
system  operating  over  a  multipath  fading  channel. 

The  poor  efficiency  of  the  matched  filter  system  is  primarily 
due  to  the  multiple  access  interference  (MAI)  produced  by 
competing  users  of  the  channel  bandwidth.  Measurements 
[4]  indicate  that  MAI  coming  from  adjacent  cells  contributes 
up  to  40  %  of  the  total  interference,  for  equally  loaded  cells, 
experienced  by  a  given  user.  For  this  reason  it  is  important 
for  a  capacity  study  to  examine  a  system  which  includes  the 
interference  from  surrounding  cells. 

The  efficiency  of  the  cellular  system  will  be  defined  as  the 
maximum  number  of  users  that  may  be  supported  in  one  cell, 
while  maintaining  a  specified  BER,  multiplied  by  the  data  rate 
of  each  user 

_  Mrd 

V  ~  B  ^ 

where  M  is  the  number  of  simultaneous  users  at  a  given  bit 
error  probability,  rd  is  the  data  rate  and  B  is  the  spread  spec¬ 
trum  bandwidth.  For  this  calculation,  it  is  assumed  that  the 
same  number  of  simultaneous  users  exist  in  all  cells  and  all 
users  have  the  same  data  rate.  This  gives  an  indication  of  the 
performance  of  the  system  normalised  to  one  cell. 

The  spreading  sequence  considered  in  this  paper  are  GOLD 
sequences  of  length  ( N  =  15  to  127).  For  code  rates  of  1/2, 
1/4  and  1/8,  these  spreading  ratios  change  to  63,  31  and  15 
respectively  in  order  to  maintain  constant  bandwidth.  With 
trellis  coding,  the  spreading  ratios  do  not  change.  The  system 
considered  has  a  BER  of  10-3  and  a  data  rate  of  39.4  Kbps. 
The  transmitted  signal  is  band-limited  by  a  raised  cosine  filter 
with  a  roll-off  of  40  %.  The  average  power  of  the  combined 
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signals  of  interferers  from  each  adjacent  cell  is  varied  from  - 
10  dB  to  -15  dB  below  the  power  of  the  signal  of  the  user  of 
interest.  Due  to  imperfect  power  control,  the  signal  power  of 
the  users  in  the  cell  of  interest  are  assumed  to  be  normally 
distributed  with  a  variance  of  1  dB. 

Simulation  results  have  shown  that  the  uncoded  system 
based  on  the  adaptive  receiver  lead  to  a  4  to  5  fold  improve¬ 
ment  in  system  efficiency  over  the  fading  channel  compared 
to  the  matched  filter  system. 

When  combining  the  systems  with  convolutional  codes  it 
was  seen  that,  while  offering  significant  advantages  for  the 
matched  filter  system,  this  form  of  coding  actually  reduces 
the  system  efficiency  for  the  adaptive  receiver  system  by  re¬ 
stricting  the  maximum  number  of  simultaneous  users.  As  the 
code  rate  decreases  from  1/2  to  1/8,  the  maximum  number 
of  users  for  the  matched  filter  system  eventually  equals  the 
number  for  the  adaptive  receiver  system. 

The  uncoded  matched  filter  system  offers  a  very  low  ef¬ 
ficiency  with  the  maximum  number  of  users  well  below  the 
spreading  ratio  N.  The  gain  from  coding  allows  a  larger  num¬ 
ber  of  simultaneous  users  however,  the  number  is  still  well 
below  the  spreading  ratio.  The  uncoded  adaptive  receiver 
system  however  offers  a  maximum  number  of  users  which  is 
approximately  70  %  of  the  spreading  ratio.  The  decrease  in 
spreading  ratio  which  accompanies  a  decrease  in  code  rate 
causes  the  maximum  number  of  users  to  decrease  linearly. 
The  gain  from  coding  is  insufficient  to  increase  the  number 
of  users  to  that  of  the  uncoded  case.  This  is  true  for  all  code 
complexities  examined. 

For  both  systems  however,  convolutional  coding  does  offer 
the  advantage  of  a  reduction  in  the  signal  to  noise  ratio  (SNR) 
required  to  achieve  a  given  BER  for  lightly  loaded  systems. 

Trellis  codes  conversely  require  no  reduction  in  spreading 
ratio  and  so  the  effect  described  above  is  minimised.  Trellis 
coding  allows  an  increase  in  efficiency  for  both  systems  as  well 
as  the  coding  gain  described  above  for  lightly  loaded  systems. 

It  is  therefore  suggested  that  convolutional  coding  should 
not  be  considered  when  attempting  to  maximise  efficiency  for 
systems  based  on  adaptive  receivers.  Rather,  trellis  codes  offer 
a  far  better  alternative. 
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Abstract  Quadratic-inverse  spectrum  estimates  for 
locally  white  non-stationary  processes  are  described. 

I.  Introduction 

Most  time-series  data  encountered  in  practice  is  non- 
stationary  whereas  most  spectrum  estimation  methods 
assume  stationarity,  and  this  has  resulted  in  many  ad- 
hoc  analysis  methods.  The  multiple-window  methods 
of  spectrum  estimation 11  is  akin  to  linear  inverse  the¬ 
ory  applied  to  the  discrete  Fourier  transform.  In  the 
original  derivation,  both  stationarity  and  “local  white¬ 
ness”  were  assumed  and  the  resulting  estimates  of 
spectra  were  simply  squares  of  the  linear  inverse. 
Quadratic-inverse  theory121  eliminated  the  locally  white 
assumption  and  gave  stable  second  moments.  The 
effects  of  limited  non-statonarity  on  the  variance  of 
such  estimates  is  knownf31.  This  talk  describes  a  way 
of  combining  the  non-stationary  quadratic  inverse  the¬ 
ory  with  that  of  harmonizable  processes. 

II.  Harmonizable  Processes 

Harmonizable  processes  may  be  written  as  gen¬ 
eralized  Fourier  transforms, 

x(t)  =  j  e'2’"1'  di,(  n  ) 

with  covariance  function  Tc(  t,  t'  )  =  E{x(  / )  x(  t'  )  } 
and  corresponding  generalized  spectral  density 
yc(  T|,  V  )  dr\  dr)'  =  E{  d^{  T| )  d^{  V  )  }.  They  are 
connected  by  the  two-dimensional  Wiener-Khintchine 
relation 

Tc(  f,  t'  )  =  J  J  el2K{  /T1  “  1  n  }  yc(  q,r|'  )  dr \  dr\  . 


properties  of  the  Slepian  sequences  makes  it  almost 
irrelevant  whether  the  white  non-stationary  model  is 
valid  globally,  or  only  locally  within  the  frequency 
band  ( /  -  W,  f  +  W).  Denote  the  vector  of  eigencoef- 
ficients  (1)  by  X  =  X(f)  and  the  covariance  matrix  of 
the  eigencoefficients  by  C jk(f)  =  E{  xj(f)  xk(f)  }. 

To  estimate  limited  non-stationarity,  expand  P(t ) 
on  an  orthonormal  bases 


=  X  piif)  Aft)  . 
/=o 


The  system  whose  kernel  is  the  square  of  the  sine  ker¬ 
nel  defining  the  Slepian  sequences 


a  i  At{n) 


N  - 1 


2 

m  =0 


sin27i\T(  n  -  m  ) 
n(  n  -  m) 


2 

A[(  m  ) 


has  4AW  real  eigensequences  corresponding  to  signifi¬ 
cantly  non-zero  eigenvalues.  The  corresponding  bases 
matrices  defined  by 

AjjP  =  V  Xj  Xk  X  v,(/}  v™  A t(n) 

n  =0 


are  trace-orthogonal,  tr  {  A(m)  }  =  at  5/>m. 

The  covariance  matrix  C  may  now  be  written 


c (/)  =  £  p,(f)  A<'> 

/=() 

so  quadratic  estimates  of  the  expansion  coefficients 

Pl(f)  =  —tr{  C (/)  A(,)  }  =  —  X+(f)  A<'>  X(/) 
a/  a  / 


Local  stationarity  is  described  by  the  spectrum  parallel 
to  the  r|  =  V  diagonal,  while  global  non-stationarity  is 
on  the  orthogonal  coordinate.  Wigner-Ville  and 
dynamic  spectra  are  obtained  from  45°  coordinate  rota¬ 
tions  followed  by  single  Fourier  transforms141.  For 
white  non-stationary  processes  the  covariance  function 
is  Tc{t,t'  )  -  P(t)  b(t  -  t'  )  so  the  corresponding 
generalized  spectrum  is  a  function  of  q  -  r\'  P(t)  is  the 
expected  power  of  the  process  at  time  t. 

III.  Estimation 

To  estimate  P(t)  we  use  a  multiple- window  method. 
Thus  one  chooses  a  frequency  /  and  a  bandwidth  If  and 
describes  the  information  contained  in  the  band 
( /  -  W ,  /  +  W )  by  the  coefficients  of  a  locally  orthog¬ 
onal  expansion  of  Slepian  functions.  Given  N  samples 
from  an  observed  sequence  jt(f)  one  chooses  a  resolu¬ 
tion  bandwidth  W  and  computes  the  eigencoefficients 

N-l 

Xk  (/  )  =  X  e~a*fn  v<nk)  (N,W)x(n)  (1) 

n  =0 

where  the  (N,Vf)  are  the  orthonormal  discrete  pro¬ 
late  spheroidal  sequences.  The  extreme  band-limiting 


follow.  po(f)  is  proportional  to  the  usual  stationary 
spectrum  S(f),  while  p\(f)  is,  loosely,  the  time 
derivative  if  the  spectrum  S(f).  It  may  be  shown  that 
these  estimates  are  unbiased  and,  because 
VariPiif)  }  —  exf 1 ,  time  resolution  of  non-stationarity 
is  essentially  limited  to  1/4  W. 
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Abstract  —  We  consider  the  problem  of  estimating 
the  min-norm  solution  to  a  low-rank,  linear  statistical 
model.  We  calculate  the  statistics  of  the  solution  as  a 
function  of  the  statistical  characterization  of  the  ma¬ 
trix  containing  observation  noise.  We  also  present  a 
new  method  for  estimating  the  rank  of  the  underlying 
noise-free  matrix. 

I.  Calculation  of  Bias  and  Variance 

Consider  the  following  linear  model 

y  =  H mxnO  ,  m  >n.  (1) 

The  parameter  vector  9  is  to  be  estimated  from  data  contained 
in  H  and  y.  In  the  absence  of  observation  noise  on  the  data, 
we  assume  that  the  rank  of  H  is  r  <  n.  Thus  we  refer  to  (1)  as 
a  low-rank  model,  and  we  are  interested  in  the  minimum-norm 
solution  for  9 .  In  practice,  the  data  will  contain  perturbations 
(noise),  and  we  assume  that  both  H  and  y  are  perturbed  by 
additive  noise  so  that  the  observed  data  are  [H  y]  4-  N.  The 
estimate  9  is  obtained  as  the  min-norm  solution  to  the  low- 
rank  model  equation  H 9  ~  y,  where  [H  y]  is  obtained  from  a 
singular  value  decomposition  of  [H  y]+N. 

Our  results  are  based  on  a  perturbation  expansion  for  the 
SVD  of  a  finite-size  matrix  [1],  We  have  previously  applied 
these  matrix  perturbation  ideas  adaptive  detection  [2],  and 
performance  analysis  of  array  signal  processing  algorithms  [3], 
In  order  to  be  useful,  the  perturbed  subspaces  must  not  be 
“too  far”  from  the  unperturbed  subspaces.  This  will  be  true 
if  the  noise  matrix  N  is  “small  enough.”  We  have  quantified 
the  concepts  of  “too  far”  and  “small  enough”  in  our  previous 
analysis  of  the  threshold  effect  [4].  First-  and  second-order 
expressions  for  the  perturbed  subspaces  were  derived  in  [5], 

In  this  paper,  we  use  the  perturbation  formulas  to  calculate 
the  statistics  (bias  and  variance)  of  the  solution  9  as  a  function 
of  the  statistical  characterization  of  N,  the  matrix  containing 
observation  noise.  We  stress  that  the  perturbation  formulas 
do  not  require  the  data  record  to  become  large.  In  addition, 
this  approach  can  handle  arbitrary  correlation  of  the  elements 
of  N. 

II.  Order  Determination 

In  the  estimation  problem  discussed  above,  a  low-rank  ap¬ 
proximation  to  the  data  matrix  is  utilized  to  draw  an  infer¬ 
ence.  In  doing  so,  it  is  implicitly  assumed  that  the  underlying 
true  rank  of  the  data  matrix  is  known.  In  practice,  this  is 
seldom  the  case  and  the  underlying  true  rank  is  unknown  and 
needs  to  be  determined. 

1This  work  was  supported  in  part  by  NUWC,  Division  Newport, 
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Under  high  SNR  conditions,  the  perturbed  signal  subspace 
is  stable  and  is  more  or  less  determined  by  the  underlying 
(noise-free)  signal  subspace.  This  in  turn  stabilizes  the  per¬ 
turbed  orthogonal  subspace.  Hence  in  different  realizations 
of  the  data,  the  singular  vectors  change  erratically,  but  the 
space  spanned  by  them  remains  relatively  unchanged.  Thus 
the  energy  in  the  perturbed  orthogonal  subspace  is  also  well 
defined.  It  is  closely  related  to  the  noise  energy  in  the  orthog¬ 
onal  subspace.  Using  matrix  perturbation  approximations  we 
quantify  this  idea  and  evaluate  the  distribution  to  be  a  central 
X2  with  (m  —  r)(n  -  r)  degrees  of  freedom. 

Based  on  this  distribution,  we  can  set  a  threshold  Tr  such 
that  Sr  <  Tr  with  a  probability  1  -  a  where  a  is  a  small 
positive  number.  In  other  words,  if  the  rank  is  r  then  Sr 
can  be  well  explained  by  the  noise  energy  alone,  and  will  be 
below  this  threshold  with  high  probability.  If  the  rank  is  r  +  1 
or  greater  then,  due  to  the  additional  signal  energy,  Sr  will 
exceed  the  threshold  with  high  probability.  Based  on  this 
idea,  we  develop  a  recursive  procedure  on  the  set  of  sums  of 
squares  of  singular  values  of  data  matrix  that  is  essentially  a 
signal  energy  detection  procedure  in  enlarging  subspaces. 

In  order  determination  there  are  two  basic  types  of  error 
probabilities,  error  due  to  overestimation  (false  alarm)  and 
error  due  to  underestimation  (miss).  The  proposed  method 
allows  the  user  to  set  a  bound  on  the  false  alarm  probability. 
The  user  can  determine  a  value  of  SNR  for  which  a  prescribed 
value  of  probability  of  detection  or  probability  of  net  error  can 
be  obtained.  Thus,  the  user  can  specify  the  conditions  under 
which  performance  goals,  specified  by  error  probabilities,  can 
be  obtained. 
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Abstract  —  A  new  algorithm  is  proposed  for  blind 
channel  identification  in  the  impulsive  signal  environ¬ 
ments,  where  the  signals  are  modeled  as  symmetric 
a-stable  processes.  The  Alpha-Spectrum,  a  new  spec¬ 
tral  representation  based  on  fractional  lower-order 
moments,  is  developed.  Conditions  for  blind  iden- 
tifiability  of  any  FIR  channel  (non-minimum  phase, 
unknown  order)  are  established  using  the  properties 
of  the  Alpha-Spectrum. 

I.  Introduction 

The  a-stable  distributions  are  characterized  by  heavy  tails  and 
infinite  variance,  except  for  the  Gaussian  distribution  (a  =  2), 
which  is  the  limiting  case.  By  the  Generalized  Central  Limit 
Theorem,  they  are  the  only  class  of  distributions  that  can  be 
the  limiting  distributions  for  sums  of  i.i.d.  random  variables. 
These  properties  make  the  a-stable  distributions  attractive 
models  for  impulsive  data  [1].  Most  algorithms  for  blind  iden¬ 
tification  of  FIR  channels  are  based  on  second-  or  higher-order 
statistics.  When  the  signals  are  impulsive  and  modeled  as  a- 
stable  processes,  these  algorithms  fail.  We  propose  a  robust 
blind  identification  method  based  on  a  new  spectral  represen¬ 
tation:  the  a-Spectrum,  for  the  impulsive  environments.  We 
then  prove  the  blind  identifiability  of  any  FIR  channel  (non¬ 
minimum  phase,  unknown  order)  driven  by  white  SaS  (a  >  1) 
processes. 

II.  Blind  Identification  with  the  a-SPECTRUM 

Our  new  blind  identification  method  is  based  on  the  properties 
of  covariation,  which  plays  a  role  analogous  to  covariance.  For 
two  jointly  SaS  random  variables  X  and  Y  with  1  <  a  < 
2,  covariation  is  defined  by  [X,  Y]a  =  fs  xy<cx~1>  p(ds),  2 
where  S  is  the  unit  circle  and  p(-)  is  the  spectral  measure 
of  the  SaS  random  vector  (X,  Y).  The  FLOM  (fractional 
lower-order  moment)  estimator  for  covariation  is  [X,  Y]Q  = 

E(xr<p  °).7  p  >  1,  where  X,  Y  are  both  real  or  isotropic 

E(|V|p)  ,y'  y  -  ’  ’ 

complex  SaS  random  variables  and  yy  is  the  dispersion  of  Y 
[2].  Properties  of  covariation  include: 

1.  If  Xi,  X2,  Y  are  jointly  SaS,  then  [aX  1  +6X2,  Y]a  = 
a[Xi,F]a+6[X2,y]a. 

2.  If  Yi,  Y2  are  independent  and  X,  Yi,  Y2  are  jointly 
SaS,  then  [X,aYi  -f  bY2 ]a  =  a<a“1>  [X,  Yi]a  + 
6<Q"1>[X,Y2U. 

3.  If  X,  Y  are  independent,  then  [X,  Y]a  —  0. 

1This  work  was  supported  by  the  Office  of  Naval  Research  under 
contract  N00014-92-J-1034. 

2The  notation: 

V-<P-1>  _  r  \Y\r~2V  Y:  complex 

“  \  |K|p-1sign(y')  Y :  real. 


For  a  FIR  channel  Yn  =  ELo  hiXn-i,  using  the  above 
properties,  we  have: 


Sa(z)  =  [Yn,  E  Yn-,Z%  =  7 ((^)<0“1>)  (ff(*))<a_1>  . 

i=-q 

(1) 

where  7*  is  the  input  dispersion  and  H(z )  is  the  filter  trans¬ 
fer  function.  Eq.(l)  is  of  fundamental  importance.  We  name 
S<x(z)  the  a-Spectrum,  with  which,  we  can  identify  both 
the  magnitude  and  phase  responses  of  the  channel.  The  mag¬ 
nitude  can  be  obtained  by  letting  \z\  =  1,  then  Sa(eJu;)  = 
\H{e*)\\  To  obtain  channel  phase  response,  noticing  the 
magnitude  \H(z)\  and  phase  &(z)  of  any  FIR  channel  can  be 
expressed  in  terms  of  A ^  aT  ancl  > 

where  {«;}  and  {X}  are  the  zeros  inside  and  outside  the 

unit  circle,  respectively.  ^(m)+jg(m)  determines  \H(eJ“)\  and 
A(m)~jg(m)  determines  $(eJ'w)  [3].  Taking  logarithm  of  both 
sides  of  Eq.(l),  we  have: 


log  |SQ(reJU/)|  =  alog  Ap-y^ 

m=  1 


m 


m—  1 


m 


sin(ma>), 


cos(ma;). 

(2) 

(3) 


where  |Sa(reJU>)|  and  fy(reju})  are  the  magnitude  and  phase 
of  the  a-spectrum,  and  pm(r)  =  rm(a“P  -f  (a  —  l)r  m, 
vm{r)  —  —  r"m.  a^-b^  can  ^ e  obtained  from 

either  |5a(reJu;)|  or  Therefore  the  channel  phase  re¬ 

sponse  is: 


*(0  =  E 


l  JT  (^(reJal)  +  ^(7eJal))  sin(»u>)dT 
Mr)  +  M7) 


sin  (no;), 

(4) 


*(o  =  e 


1 1  f;  lo§  eos{nQ)dU 


|Sa(reJQ)| 
Mn(r)  -  (in($) 


a(nu>), 


(5) 
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Abstract  —  Blind  identification  consists  of  estimat¬ 
ing  the  impulse  response  of  a  linear,  time-invariant 
channel  used  for  transmission  of  digital  data  by  ob¬ 
serving  the  channel  output  without  knowledge  of  the 
transmitted  symbol  sequence. 

The  aim  of  this  paper  is  twofold.  First  we  compare, 
in  order  to  assess  their  applicability  to  the  equaliza¬ 
tion  of  digital  radio  links  affected  by  selective  fad¬ 
ing,  some  recently  proposed  algorithms  based  on  the 
second-order  statistics  of  the  received  signal.  Further 
we  show  how  one  of  these  algorithms  can  be  modified 
to  account  for  correlated  noise. 

I.  Introduction 

By  blind  identification  we  mean  here  the  estimate  of  the  im¬ 
pulse  response  of  a  linear,  time-invariant  noisy  channel  used 
for  transmission  of  digital  data;  this  estimate  is  obtained 
by  observing  the  channel  output  without  knowledge  of  the 
transmitted  symbol  sequence. 

The  desirable  features  of  the  ultimate  blind  identification 
algorithm  are  the  following: 

•  Low  identification  error  in  the  presence  of  noise. 

•  Fast  convergence. 

•  Computational  simplicity. 

•  Insensitivity  to  data-symbol  correlation. 

«  Insensitivity  to  noise  correlation. 

0  Possibility  to  make  it  adaptive. 

Tong,  Xu,  and  Kailath  [6,  7]  have  developed  a  blind  iden¬ 
tification  algorithm  (herewith  referred  to  as  TXK  algorithm) 
which  is  based  on  an  estimate  of  the  autocorrelation  function 
of  the  observed  channel-output  samples.  Thisfeature  entails 
a  convergence  faster  than  other  blind  algorithms  based  on 
higher-order  statistics  [9],  which  is  highly  desirable  when 
the  channel  is  time-varying  and  its  variations  have  to  be 
tracked  quickly  in  order  to  compensate  for  them.  This  al¬ 
gorithm  converges  globally,  can  resolve  the  non-minimum- 
phase  zeros  of  the  channel  transfer  function,  and  is  robust 
with  respect  to  timing  recovery.  However,  it  suffers  from 
some  drawbacks,  viz., 

t  It  is  computationally  intensive,  as  it  requires  two 
singular-value  matrix  decompositions. 

0  It  requires  data  symbols  to  be  uncorrelated. 

0  It  requires  the  noise  to  be  uncorrelated. 

0  It  is  not  adaptive. 

More  recently,  an  improved  algorithm  (herewith  referred 
to  as  MDCM)  was  proposed  by  Moulines  et  al.  [4].  The  ad¬ 
vantages  of  this  new  algorithm  over  TXK  are: 

0  Lower  computational  complexity. 

^his  research  was  sponsored  by  the  Human-Capital  and  Mobil¬ 
ity  Program  of  the  European  Union. 


0  Convergence  even  with  correlated  data  symbols. 

0  Lower  identification  error  for  the  same  observation 
length. 

Baccala  and  Roy  [1,  2]  have  proposed  a  new  algorithm 
(herewith  referred  to  as  BR)  that  presents  a  significantly 
lower  computational  complexity  and  an  identification  error 
close  to  that  of  TXK  and  MDCM  algorithms. 

II.  Our  results 

The  aim  of  this  paper  is  twofold.  First  we  compare,  by  com¬ 
bining  analysis  and  simulation,  the  TXK,  MDCM,  and  BR 
algorithms,  in  order  to  assess  their  applicability  to  the  equal¬ 
ization  of  digital  radio  links  affected  by  selective  fading.  Our 
results  show  that  in  general  these  algorithms  based  on  sec¬ 
ond  order  statistics  outperform  standard  blind  equalization 
in  terms  of  convergence  speed.  Moreover,  while  the  BR  algo¬ 
rithm  has  a  lower  computational  complexity,  in  this  specific 
application  the  MDCM  algorithm  outperforms  both  TXK  and 
BR  in  terms  of  robustness  to  source-data  correlation  and 
mean-square  estimation  error. 

Further,  we  derive  a  modification  of  the  MDCM  algorithm 
in  [4]  in  order  to  achieve  blind  identification  in  the  presence 
of  unknown  correlated  noise.  Our  algorithm  is  based  on  a 
matrix  decomposition  method  proposed  in  [5]. 
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Abstract  —  Iterative  methods  have  of  late  en¬ 
joyed  increasing  popularity  in  signal  restoration  prob¬ 
lems.  Inherent  mathematical  difficulties  have  led  re¬ 
searchers  to  propose  ad  hoc  solutions  in  many  in¬ 
stances.  The  question  of  optimality  of  such  solutions 
is  an  open  one.  This  paper  concerns  this  question  for 
a  class  of  iterative  methods  of  signal  restoration  and 
offers  a  criterion  for  optimality  based  on  information 
theory. 

I.  Introduction 

The  signal  restoration  problem  has  classically  been  modelled 
as  that  of  estimating  the  input  z  to  a  sytem  ,  assuming  that 
the  distorting  process  h  is  specified  or  estimatable  and  that 
the  distorted  output  u  is  available.  Further  generalizations 
incorporate  any  a  “priori  knowledge  about  the  solution  ,  into 
the  restoration  process  ,  in  the  form  of  constraint (s).  A  well 
defined  system  model  in  combination  with  robust  techniques 
naturally  leads  to  good  results.  More  interesting  is  the  case 
where  the  problem  belongs  to  the  class  of  ill-posed  problems. 
Regularization  theory  is  used  to  pose  a  corresponding  well- 
posed  problem  ,  the  solution  to  which  is  a  close  enough  ap¬ 
proximation  of  the  solution  to  the  ill-posed  problem  being 
considered  [1]. 

II.  Algorithm  Derivation 

The  problem  is  formulated  as  the  constrained  minimization 
of  a  stabilizing  functional  [2]  Q(x).  Recently  ,  Noonan  and 
Achour[3],  [4]  studied  the  use  of  the  Itakuro-Saito  distance 
from  communication  theory  and  the  Kullback-Leibler  mea¬ 
sure  from  statistics  as  stabilizing  functionals.  They  proposed 
a  generalized  mapping  function  based  on  the  use  of  the  Mutual 
Information  Measure  as  the  stabilizing  functional  and  incor¬ 
porated  a  priori  noise  variance  information  as  a  mean  squared 
error  constraint.  Mathematically  stated, 

Minimize  : 

ft(u,z)  =  z)^n 

u  z 

Subject  To  : 

j-j2(u-h**)2 

i-i 

The  proposed  generalized  mapping  function  resulting  from 
this  optimization  leads  to  the  following  general  form  , 

cp  (zn+ 1)  =  Ifi(zn)exp  (A z'n  ([«  -  Zn  *  h]  *  hf)) 

where  hf  is  the  flipped  i.e.  time-reversed  version  of  the  dis- 
torting  process  h  and  z /  is  the  partial  of  z  with  respect  to 

¥>(*)• 


III.  Special  Cases  of  the  Mapping 

In  this  paper  we  demonstrate  that  various  well  known  ad  hoc 
algorithms  are  special  cases  of  the  proposed  mapping  function 

[3], [4].  In  particular  the  pioneering  Van-Cittert  method  and 
the  popular  Landweber  restoration  technique  are  shown  to 
belong  to  this  class  of  optimum  algorithms. 

The  specific  mapping  c p(z)  =  exp  {z  (z  *  hf))  yields  , 

zn  =  zn  4“  A  (u  h  *  zn ) 

which  is  the  Van  Citiert  restoration  algorithm  with  a  particu¬ 
lar  smoothing  function  h,  while  the  specific  mapping  <p(z)  = 
exp(z  z)  yields  , 

Zn  +  1  =  Zn  +  A  (U  ~  Zn  *  k)  *  hf 

which  is  the  Landweber  restoration  algorithm  with  a  particular 
smoothing  function  h 

The  algorithm  generated  by  the  use  of  the  Kullback-Leibler 
Measure  is  given  by  , 

2n  =  Znexp{ A  ([u  “  *  h]  *  h/)) 

which  can  be  generated  from  the  generalized  mapping  function 
by  the  trivial  mapping  <p(z)  —  z 

This  provides  a  sound  statistical  argument  for  the  use  of 
these  methods  and  establishes  an  optimality  interpretation  for 
their  estimates.  The  Mutual  Information  Measure  is  based  on 
the  concepts  of  entropy  and  information  content.  Interestingly 
,  the  proposed  mapping  function  is  derived  by  normalizing  the 
signal  and  identifying  this  as  the  probability  density  function 
of  the  signal  itself.  This  method  can  thus  be  applied  to  signals 
whose  probability  structure  is  not  fully  known.  The  derived 
algorithm  has  been  shown  to  be  stable  and  robust[5].  There 
exists  a  strong  condition  for  the  convergence  of  the  mapping 
in  the  general  case.  For  specific  cases  the  condition  simplifies 
to  a  weaker  problem  specific  condition. 

Applications  of  the  above  algorithm  are  discussed.  In  par¬ 
ticular  noisy  image  restoration  and  high  resolution  estimation 
of  spectra  are  presented.  This  work  is  a  continuation  of  that 
presented  in  [3], [4],  [5]  . 
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Abstract  —  Stochastic  processes  subjected  to  a  peri¬ 
odic  clock  change  function  will  have  weighted  versions 
of  its  power  spectrum  reproduced  at  integer  multi¬ 
ples  of  the  jitter  frequency.  It  has  been  shown  that 
the  original  process  may,  in  theory,  be  reconstructed 
without  error  by  a  suitable  choice  of  correction  filter 

[1].  In  this  paper  we  extend  the  results  presented  in 
[1]  to  the  general  case  where  the  resulting  process  is 
a  linear  combination  of  N  clock  change  functions. 


III.  Reconstruction 

A  linear  reconstruction  of  Z  ( t )  based  on  the  observation  of 
a  frequency  band  centered  around  kX  of  W  (£),  is  the  linear 
projection  of  Z  ( t )  on  the  Hilbert  space  spanned  by  this  pro¬ 
cess.  This  reconstruction  is  ideal  when  the  spectrum  of  Z  (t) 
is  bounded  such  that 

dSz  (u)  =  0  (-y ,  (7) 


I.  Definitions 

Let  Z  =  {Z  (t) ,  t  6  R}  be  a  random  wide  sense  stationary 
process  of  zero  mean  and  mean  square  continuous  admitting 
an  autocorrelation  function  Kz  (t)  and  a  Cramer-Loeve  rep¬ 
resentation  Oz  (w)  [2].  The  results  regarding  a  single  periodic 
clock  change  are  given  in  [1].  We  consider  here  the  following 
extended  definition: 

w  W  =  Eli  z(t-fn(t  +  e ))  9n  (t  +  o)  (i) 

where  6  is  a  random  variable,  independent  of  Z  (t)  and  uni¬ 
formly  distributed  on  (0,27 r).  We  assume  that  the  func¬ 
tions  fn  (■)  and  gn  (•)  are  periodic  with  the  same  period 
Tn  =  27r/u;n,  and  that  the  frequencies  are  related  in  the  fol¬ 
lowing  manner: 

—  =  ~  (p,  g)  6  IN  x  IN*  V  (m,  n)  €  {1..1V}2  (2) 

There  then  exists  a  frequency  A,  the  smallest  multiple  of  the 
set  {wn }  such  that  the  functions  fn  (•)  and  gn  (•)  are  periodic 
in  T\  =  27r/A. 

II.  Power  Spectrum  and  Linear  Periodic 
Filtering 

Using  the  Cramer-Loeve  representation  of  (1),  we  find 

W  W  =  Jr  e<“‘  Ell  (t  +  9)  d@z  (w)  (3) 

where  the  summation  is  periodic  (of  period  T\).  If  it  can  be 
expressed  in  terms  of  its  Fourier  components 

Ell  («)  =  El_«.  **  M  (4) 

then  W  ( t )  is  a  cyclostationary  process,  stationarised  by  the 
phase  9.  W(t~0)  then  admits  a  continous  series  representa¬ 
tion  whose  elements,  the  responses  of  Z  ( t )  through  the  linear 
invariant  filters  $k  (w) ,  are  jointly  stationary  [3].  This  no¬ 
tation  also  demonstrates  that  W  (t  —  6)  can  be  seen  as  the 
filtering  of  Z  (t)  by  a  linear  periodic  filter  [4] ,  whose  impulse 
response  is  given  by 

h{t,t-r)  =  In  ^  ^ ^  (5) 

The  power  spectrum  of  W  (t)  follows  directly  from  that  of 

W  (t  — 9) 

dSw  (w)  =  El-ao  (w  -  fcA)|2  dSz  (w  -  kX)  (6) 


It  can  be  shown  that  this  is  a  filtering  operation,  where  the 
filter  is  given  by 

Hk  (w)  =  nk  M  /**  (w)  (8) 

where  rife  (u>)  is  an  ideal  bandpass  filter  centered  at  uj  =  k\ 
and  (oj)  is  given  by 

M  =  Eli  jr  lo"  e-^^-^g  (u)  du  (9) 

-L  n 


IV.  Example 

Consider  the  case  where  Z  (t)  is  a  sequence  of  independent 
bits,  N  =  2,  and  the  periodic  functions  are  given  by 


fi  W  =  Oi  sin  t)  A  (t)  =  ft  sin  (ujit/2) 
gi  (t)  =  1  g2  (i)  =  exp  (iu)it/2) 


(10) 


In  this  case,  the  receiving  filter,  based  on  the  baseband  signal 
is  given  by 


Hoioj) 


_ 1 _ 

Jo  (auj)  +  Ji  (j3u>) 


(11) 


where  Jn  (c*;)  is  the  n’th  order  Bessel  function. 


V.  Conclusion 

In  this  paper  we  have  presented  a  generalisation  of  periodic 
clock  changes.  We  developed  the  optimal  receiving  filter  and 
found  its  analytical  expression  in  a  particular  example. 
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Abstract  —  A  near-optimum  method  for  filtering 
out  the  quantization  noise  is  presented.  Use  is  made 
of  the  result  that  the  spectrum  of  the  quantization 
noise  is  related  to  the  probability  density  function  of 
the  signal  derivative. 

I.  Introduction 

The  need  for  representing  signals  by  a  finite  number  of 
bits  implies  that  quantization  noise  is  present  in  almost  all 
digital  signal  processing  systems  and  inherently  occurs  in  the 
analog-to-digital  conversion  process.  The  distortion  error,  or 
quantization  noise,  consists  of  the  difference  between  the  input 
to  the  quantizer  and  the  discrete  output  signal. 

In  the  following  a  near-optimum  method  for  filtering  out 
the  quantization  noise  is  presented.  Use  is  made  of  the  result 
that  the  spectrum  of  the  quantization  noise  is  remarkably  re¬ 
lated  to  the  probability  density  function  (pdf)  of  the  signal 
derivative. 


II.  Near  Optimum  Filtering  of  Quantized 
Signals 

For  a  signal  x(t ),  assumed  stationary  in  the  wide  sense,  the 
power  spectrum  density  (PSD)  of  the  quantization  noise  was 
shown  to  be  given  by  [1] 


Sx(w) 


2  OO 

d  1  wd 

2^2  2-J  ^Px'(2^] 
11- 1 


(1) 


where  px  ( . )  is  the  pdf  of  the  derivative  of  the  input  signal. 

Equation  1  demonstrates  that  the  PSD  of  the  quantization 
noise  is  related  to  the  pdf  of  the  derivative  of  the  input  signal. 
The  convergence  of  the  noise  spectrum  to  Equation  1,  as  the 
stepsize  decreases,  is  a  result  of  a  previous  work  [2]. 

In  order  to  design  the  optimum  filter,  for  extracting  the 
signal  corrupted  by  noise,  there  are  two  common  approaches: 
minimization  of  the  signal  to  quantization  noise  ratio  (SQNR), 
that  leads  to  the  matched  filter,  or  minimization  of  the  mean 
square  difference  between  the  input  signal  and  its  estimate, 
leading  to  the  theory  of  Wiener  filtering. 

It  is  useful  to  consider  an  estimator  based  on  the  Wiener 
filter,  that  minimizes  the  expected  value  E[x(t)  —  x(t)]2,  where 
the  estimator  is  given  by 


/oc 

h(t  —  r)[x(r)  +  n(r)}dr. 


(2) 


If  a  Gaussian  process  is  assumed  as  the  input  signal,  the 
cross  correlation  between  the  quantization  noise  and  the  signal 
is  given  by  [3] 


Rxs(t)  =  2 Rx(r)  ^(-lf  e’r2',25‘5iVR/6.  (3) 

n=l 

1This  work  was  partially  sponsored  by  the  Brazilian  Council  for 
Scientific  and  Technological  Research  (CNPq). 


Figure  1:  Near  optimum  filters  for  selected  quantization 
stepsizes. 


For  an  SQNR  above  OdB,  the  cross  correlation  is  about 
eight  orders  of  magnitude  smaller  than  the  input  signal  auto¬ 
correlation.  This  corroborates  the  assumption  of  an  uncorre¬ 
lated  noise  at  the  output  of  the  quantizer.  Therefore,  based  on 
the  formula  for  the  noise  spectrum,  one  can  design  a  Wiener 
filter  that  is  near  optimum  for  the  above  conditions 


Sx(u)  +  S7T  Er=i 


(4) 


where  Sx(v)  is  the  signal  PSD. 

Figure  1  depicts  the  results  of  application  of  Formula  4, 
with  the  signal  obtained  by  passing  white  Gaussian  noise 
through  a  first-order  Butterworth  lowpass  filter  [4],  for  se¬ 
lected  values  of  the  stepsize,  and  shows  how  the  number  of 
levels  can  influence  the  design  of  a  filter.  The  signal  spectrum 
is  also  drawn  in  the  same  figure.  It  seems  clear  from  this  fig¬ 
ure  that  the  optimum  filter  tends  to  an  allpass  filter  as  the 
stepsize  decreases. 
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I.  Introduction 

A  distributed  detection  system  is  considered  in  which  two  sen¬ 
sors  and  a  fusion  center  jointly  process  the  output  of  a  random 
data  source  (see  figure).  It  is  assumed  that  the  null  and  alter¬ 
native  distributions  are  spatially  correlated  Gaussian,  differing 
in  the  mean;  thus  the  random  source  is  either  noise  only  or  a 
deterministic  signal  plus  noise. 


Finally,  we  consider  the  following  question.  Assuming  that 
the  sufficient  condition  discussed  previously  is  satisfied,  does 
symmetry  in  the  signal  and  noise  models  (same  marginal  for 
both  sensors)  imply  symmetry  in  the  optimal  solution,  with  gx 
and  gy  being  identical  contiguous  partitions  of  the  real  line? 
We  find  that  this  is  indeed  true,  and  in  such  cases,  optimal 
design  is  further  simplified. 


In  the  presence  of  spatial  dependence,  the  joint  optimiza¬ 
tion  of  local  quantizers  gx,  gy,  and  global  decision  rule  V  may 
yield  solutions  in  which  gx  and  gy  are  not  based  on  marginal 
likelihood  ratio  tests.  This  is  one  instance  where  distributed 
detection  departs  from  the  traditional  statistical  framework 
where  likelihood  ratios  are  sufficient  for  most  purposes.  This 
departure  was  first  noted  in  [1],  and  was  corroborated  specif¬ 
ically  for  the  additive  Gaussian  noise  model  by  means  of  a 
counterexample  [2]  involving  two-dimensional  vectors  X  and 
Y. 

This  work  is  an  attempt  to  characterize  noise  models  for 
which  the  optimal  system  employs  marginal  likelihood  ratio 
tests.  In  the  setup  where  each  sensor  draws  one  local  ob¬ 
servation  (i.e.,  X  and  Y  axe  scalars  X  and  Y,  respectively), 
we  succeed  in  obtaining  a  sufficient  condition  on  the  noise 
mean  and  covariance  under  which  the  optimal  binary  quan¬ 
tizers  are  contiguous  partitions  of  the  marginal  observation 
space.  Since  the  marginal  likelihood  ratio  is  a  linear  function 
of  the  local  observation  (X  or  Y),  this  result  implies  that  gx 
and  gy  are  threshold-type  functions  of  the  marginal  likelihood 
ratio.  It  also  reduces  the  optimization  to  identifying  break 
points  (thresholds)  in  the  marginal  observation  space. 

We  also  examine  whether  the  sufficient  condition  discussed 
previously  is  also  necessary,  and  find  that  violation  of  this 
condition  may  in  certain — but  not  all — cases  render  the  con¬ 
tiguous  marginal  likelihood  ratio  partition  suboptimal.  We 
reach  this  conclusion  by  examining  the  special  case  where  the 
noise  marginals  are  the  same  for  both  sensors;  the  sufficient 
condition  is  then  equivalent  to  positive  correlation  between  X 
and  Y.  We  find  that  for  values  of  the  correlation  coefficient 
p(X,  Y)  close  to  —  1,  local  quantizers  based  on  non-contiguous 
likelihood  ratio  partitions  outperform  those  based  on  contigu¬ 
ous  likelihood  ratio  partitions.  We  were  not  able  to  establish 
the  same  for  p(X,  Y)  close  to  0" . 

1  Po-Ning  Chen  is  with  the  Computer  and  Communication  Re¬ 
search  Laboratories  at  the  Industrial  Technology  Research  Institute, 
Hsin-Chu,  Taiwan  ROC. 

Adrian  Papamarcou  is  with  the  Department  of  Electrical  En¬ 
gineering  at  the  University  of  Maryland,  College  Park,  USA. 


11.  STATEMENT  OF  RESULTS 
The  observation  statistics  are  denoted  by 


H0  ;  Px 


Hi  :  Qx 


AT 


M 


uXy 


((:)■ 


(Txy 

A 


°xy 

A 


A  Bayesian  setting  is  assumed,  in  which  Ho  and  Hi  are 
assigned  prior  probabilities.  Also,  quantizers  are  binary 
throughout,  i.e.,  ||px||  =  ||py||  =  2. 

Theorem  1  If 


Vxy(wl  ~  ^xy)(fJUTl  ~  Wxy)  >  0, 


(1) 


then  there  exist  optimal  quantizers  of  X  andY  which  are  con¬ 
tiguous  partitions  of  the  real  line. 


Counterexample  Assume  a  uniform  prior.  Let  =  ay  = 
T  =  g  =  1  and  crxy  =  —1,  so  that  (X,  Y)  lies  on  a  straight  line 
with  probability  1  under  each  hypothesis.  It  can  be  shown 
that  every  contiguous  binary  partition  of  the  real  line  is  out¬ 
performed  by  noncontiguous  one. 

Remark  The  above  counterexample  clearly  represents  an  ex¬ 
treme  case  where  either  of  the  local  observations  is  a  sufficient 
statistic  for  centralized  testing.  The  same  effect,  however,  can 
be  obtained  by  choosing  axy  «  —  1  and  applying  a  continu¬ 
ity  argument.  A  nondegenerate  counterexample  can  then  be 
constructed. 


Theorem  2  Let  p  =  77  and  .  If  the  local  quantiz¬ 

ers  are  constrained  to  be  binary  contiguous  partitions  of  the 
real  line,  then  the  optimal  quantizer  pair  employs  the  same 
threshold  in  both  quantizers. 

In  conjunction  with  Theorem  1,  the  above  theorem  implies 
the  following  corollary. 


Corollary  1  Let  the  signal  and  noise  models  be  symmetric.  If 
& xy  >0,  then  an  optimal  solution  exists  in  which  both  quantiz¬ 
ers  use  the  same  contiguous  partition  of  the  observation  space. 
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ABSTRACT  -  In  this  paper  we  report  on  some  progress 
towards  a  solution  for  the  assignment  problem  in  coding,  ie  the  mapping 
of  information  words  onto  codewords  at  the  encoder,  and  the 
computationally  more  difficult  problem  of  mapping  codewords  onto 
information  words  at  the  decoder. 

I.  Introduction 

The  assignment  problem  is  still  unsolved  for  many  classes  of  constrained  or 
nonlinear  codes  [1*7]  ,  including  some  recording  codes  developed  in  recent 
years  [2].  A  few  previous  approaches  have  been  based  on  applying  techniques 
such  as  Pascal's  triangle  [1,2]  or  mapping  block  codes  onto  trellises  [3]. 

Previously,  we  investigated  the  assignment  problem  for  constrained  codes 
with  short  symbol  lengths,  in  order  to  minimize  error  extension  at  the  decoder 
[8].  In  this  paper,  we  consider  longer  codeword  lengths  and  focus  on 
designing  a  mapping  algorithm  feasible  for  implementation  when  a  single 
lookup  table  for  mapping  codewords  onto  information  words,  becomes  too 
large  to  implement. 

Traditionally,  coding  and  mapping  algorithms  have  been  based  on 
computations  which  can  be  modeled  as  integer  manipulations.  The  mapping 
algorithm  that  we  propose  in  this  paper,  exploits  the  capability  of  modem 
digital  circuits  to  handle  rational  numbers.  It  can  be  implemented  with 
magnitude  comparison  of  integers  and  two  lookup  tables,  which  are  for  many 
codes  of  interest  much  smaller  than  the  abovemen tioned  single  lookup  table. 
For  each  codeword,  a  moment  with  rational  weights  is  computed,  similar  to 
the  moment  computed  in  the  Varshamov-TenengoTts  construction  [7],  or 
when  constructing  higher-order  spectral  null  codes  [2],  If  necessary,  this 
computation  can  be  implemented  as  arithmetic  multiplication  and  division  of 
integers. 

n.  Algorithm 

Consider  the  mapping  of  codewords  onto  information  words  at  the  decoder 
of  an  (/I, Jt)  binary  block  code.  An  exhaustive  lookup  table  requires  a  memory 
of  size  2",  and  it  may  be  infeasible  to  implement.  We  propose  a  decoder  with 
memory  upperbounded  by  2k  +  1/n.  Our  algorithm  is  thus  of  interest  when 
decoding  codes  where  a  memory  of  2k  +  1/n  is  feasible  to  implement,  while  a 
memory  of  2n  is  infeasible.  For  example,  many  constrained,  constant  weight, 
or  nonlinear  codes  of  interest  have  R=  k/n  ~  1/2  and  k  <,20  [2, 4, 5, 6]. 

When  setting  up  the  decoder,  we  start  with  the  set  of  2k  n-bit  codewords, 

x  =(  x1 . ),  which  are  ordered  using  the  standard  lexicography  of  n-bit 

binary  numbers.  Next,  we  partition  the  codebook  into  2k/n  subsets  of 
consecutive  codewords,  each  with  cardinality  upperbounded  by  n.  For  each 
subset,  we  set  up  a  system  of  n  linear  equations  as  follows.  For  the  Vth 
codeword,  x\  we  set 


M 


where  h  is  the  integer  representation  of  the  information  word  onto  which 
this  codeword  is  mapped.  We  now  use  the  set  of  n  linear  equations  to  solve 
a  set  of  weights  {a,  }  for  each  subset  of  codewords.  These  sets  of  weights  are 


stored  in  Table  A  of  dimension  2k/n  at  the  decoder.  A  second  lookup  table  of 
dimension  2k/n  at  the  decoder,  Table  B,  is  used  to  store  the  lexicographically 
last  codeword  of  each  subset. 

When  mapping  a  codeword  onto  an  information  word,  the  decoder  compares 
it  to  the  entries  in  Table  B,  to  determine  which  entry  from  Table  A  should 
be  used  to  compute  the  information  word,  using  (1). 

While  the  algorithm  is  thus  conceptually  simple,  it  may  present  interesting 
algebraic  or  combinatorial  problems  when  taking  advantage  of  the  structure 
inherent  in  many  codebooks.  For  some  classes  of  codes,  each  codeword's 
complement  is  also  in  the  codebook  or  the  codebook  can  be  partitioned  into 
codebooks  with  words  which  are  identical,  except  for  different  prefixes.  In 
these  cases,  it  is  possible  to  reduce  the  size  of  the  lookup  tables  at  the 
decoder.  The  memory  requirements  may  be  reduced,  or  the  necessity  of 
accessing  the  entries  in  Table  B  sequentially,  may  also  be  precluded,  using 
tree  searches. 

m.  Conclusions 

Many  interesting  classes  of  constrained,  constant  weight,  or  other  nonlinear 
codes,  have  previously  been  constructed  in  the  literature.  While  the  useful 
properties  of  these  codes  have  been  proved,  the  difficult  problem  of  mapping 
codewords  onto  information  words,  is  often  not  addressed.  With  this  paper, 
we  hope  to  contribute  towards  a  general  approach  to  solve  this  problem. 
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Abstract  —  In  this  talk  we  consider  real-time  non- 
parametric  algorithms  for  nonstationarity  detection 
and  SVD  updating  based  on  Jacobi  rotations.  We 
propose  two  schemes  which  improve  the  overall  per¬ 
formance  when  the  rate  of  change  of  the  data  is  high. 
In  the  “variable  rotational  rate”  scheme,  the  number 
of  Jacobi  rotations  per  update  is  dynamically  deter¬ 
mined.  In  the  “variable  forgetting  factor”  approach, 
the  effective  width  of  the  observation  adjusts  to  the 
data  nonstationarity. 

I.  Introduction 

In  this  talk,  we  investigate  the  algorithmic  and  architectural 
relationships  among  the  input  update  rate,  the  rate  of  con¬ 
vergence  of  the  Jacobi-SVD  algorithm,  and  the  quality  of  the 
SVD  processed  outputs.  This  approach  provides  new  insights 
on  the  selection  of  forgetting  factors  needed  in  adaptive  signal 
processing.  We  also  obtain  a  real-time,  nonparametric  non¬ 
stationarity  indicator  of  the  observed  data  in  terms  of  their 
singular  value  behavior.  The  proposed  algorithm  has  been  ap¬ 
plied  to  problems  in  DOA  estimation,  speech  segmentation, 
and  linear  prediction. 

II.  The  Jacobi  SVD  Algorithm 

Given  the  computed  matrices  £/m,  Em,  Vm 
•  application  of  forgetting  and  vector  projection 


ft^m]  %'m+l  ^m+l  Vm 
•  QR  updating 


•  Jacobi  rotations  (rediagonalization) 

m- 1-1)  L)rn+1  *  Em ,  Vm- j-1  *  Ljtj 

for  k  =  1, . . . ,  £;  for  i  =  1, . . . ,  n  —  1 

Apply  a  Jacobi  rotation  to  rows  and 

columns  i  and  i  -f  1  of  Em+i 

Propagate  the  rotations  to  Um+i  and  Vm+i 

end 

The  QR  and  the  Jacobi  rotation  steps  can  be  implemented  on 
a  parallel/systolic  architecture  [3]. 

Variable  Rotational  Rate  Scheme.  For  sufficiently  slowly 
changing  data,  a  slowly  updating  implementation  of  the  Ja¬ 
cobi  SVD  algorithm  produces  the  same  (or  better)  estimates 
than  a  higher  throughput  implementation,  for  equal  computa¬ 
tional  rate  [1].  When  the  data  variation  increases,  a  higher  up¬ 
dating  rate  with  no  computation  rate  increase  produces  com¬ 
puted  singular  matrices  which  are  far  from  convergence.  The 
idea  we  explored  is  to  “decouple”  the  updating  rate  from  the 
speed  at  which  rotations  are  computed  (“rotational  rate”). 

1This  work  is  supported  by  NASA/Dryden  grant  NCC  2-374. 


Consider  the  QR  factorization  required  by  the  updating  al¬ 
gorithm,  where  X)^  is  upper  triangular.  In  order  to  give  an 
estimate  to  the  number  of  Jacobi  rotations  needed  to  diago¬ 
nalize  E^,  what  is  of  interest  is  the  amount  of  fill-in  in  the 
submatrix  of  E^  This  is  in  turn  related  to  the  value  am+i  = 
||*m+iVS||/||ajm+i||,  m  =  0,1,...,  where  Vm  =  (V£,  V£),  and 
Xm+ 1  is  the  incoming  vector.  The  quantity  am+i  represents 
the  degree  of  nonstationarity  of  the  incoming  data  and  is  easily 
computed  in  the  Jacobi  algorithm. 

From  our  analysis  of  the  initial  convergence  behavior  of  the 
Jacobi  SVD  algorithm,  as  well  as  the  behavior  of  the  off-norm 
of  E^  in  time,  we  propose  the  following  variable  rotational 
rate  scheme,  for  medium  to  high  SNR,  noise  power  and 
numerical  rank  =  r: 

1)  Compare  the  nonstationarity  indicator  to  a  threshold 
function  of  <tn-  2)  If  <  pi,  then  choose  a  value  for  t  not 

smaller  than  r.  Otherwise  choose  i  >  n.  3)  Choose  a  high 
enough  forgetting  factor,  which  guarantees  that  the  diagonal 
elements  of  Em  are  sufficiently  large  (cf.  below). 

Variable  Forgetting  Scheme.  We  have  also  studied  the  re¬ 
lationship  between  the  data  variation  an  the  forgetting  factor. 
SVD  tracking  requires  narrower  observation  windows,  as  the 
rate  of  data  variation  increases.  If  it  is  required  that  the  num¬ 
ber  of  Jacobi  rotations  which  rediagonalize  E;  be  kept  low, 
then  the  amount  of  fill-in  produced  by  the  QRD  step  has  to 
be  limited.  This  is  achieved  by  setting  a  minimum  value  for 
ft.  The  proposed  variable  forgetting  scheme  is  summarized  as: 
1)  Determine  at  every  time  instant  the  duration  of  the  sta- 
tionarity  window,  Nw.  2)  Given  a  threshold  6,  compute  ft  so 
that  ftNyv  <  b.  3)  Make  sure  that  ft  is  not  too  small,  ft  >  ftmin- 
Compute  ft  as  ft  —  max  {&iyWw ,  ftm[n  }. 

The  proposed  SVD  updating  algorithm  can  find  application 
in  many  situations,  such  as  beamforming,  adaptive  filtering, 
DOA  tracking,  speech  processing  (segmentation,  glottal  clo¬ 
sure  detection),  adaptive  parameter  estimation  [l].  In  all  the 
cases  considered,  the  algorithm  promptly  detects  signal  non- 
stationarities,  whether  in  amplitude  or  phase.  The  ability  to 
track  data  variability  is  exploited  for  real-time  adaptation  of 
the  SVD  updating  algorithm,  thereby  producing  more  accu¬ 
rate  estimation  of  singular  values/subspaces.  In  certain  appli¬ 
cations,  such  as  speech  segmentation,  the  proposed  algorithm 
can  be  used  with  the  double  function  of  detecting  data  tran¬ 
sitions  (voiced  to  unvoiced)  and  computing  the  desired  filter 
parameters.  This  algorithm  can  be  implemented  on  a  parallel 
(systolic)  processor  with  relative  ease. 
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Abstract  —  We  consider  the  LMS  estimation  of  a 
channel  that  may  be  well  approximated  by  an  FIR 
model  with  only  a  few  nonzero  tap  coefficients  within 
a  given  delay  horizon  or  tap  length  n.  When  the  num¬ 
ber  of  nonzero  tap  coefficients  m  is  small  compared  to 
the  delay  horizon  n,  the  performance  of  the  LMS  es¬ 
timator  is  greatly  enhanced  when  this  specific  struc¬ 
ture  is  exploited.  We  propose  a  consistent  algorithm 
that  performs  identification  of  nonzero  taps  only. 

I.  Introduction 

In  various  adaptive  estimation  applications,  the  unknown 
channel  is  characterized  by  an  impulse  response  which  con¬ 
sists  of  extended  regions  of  negligible  response  or  ‘inactivity’. 
Examples  include  circuit  echo  paths  within  4-wire  loop  tele¬ 
phony  networks,  which  typically  show  initial  inactive  regions 
within  their  impulse  responses,  and  room  acoustic  echo  paths 
and  mobile  radio  channels  which  typically  show  impulse  re¬ 
sponses  having  many  inactive  regions  interspersed  by  ‘active’ 
or  nonzero  regions.  Our  aim  is  to  develop  a  technique  which 
discriminates  between  the  active  and  inactive  regions  of  such 
channels  and  to  subsequently  LMS  estimate  only  the  active 
regions  of  the  channel. 

II.  System  Description 

Assumption  1:  Unknown  channel  is  linear,  time  invariant  and 
is  adequately  modelled  by  a  discrete-time  FIR  filter  0(^“1) 
with  a  maximum  lag  of  n  sample  intervals. 

Assumption  2:  Only  m  <  n  of  the  taps  of  0(z“J)  are  nonzero. 
Assumption  3:  All  signals  are  sampled.  At  sampling  instant  k: 
u(k)  is  the  signal  input  to  the  unknown  channel  and  the  chan¬ 
nel  estimator;  an  additive  disturbance,  s(A:),  occurs  within 
the  unknown  channel;  and  y(k)  =  U{k)T0  4-  s(A:)  is  the  ob¬ 
served  output  from  the  unknown  channel,  where  6  is  the  n  tap 
unknown  channel  tap  vector  and  U(k)  is  the  n  tuple  vector 
containing  the  last  n  input  samples. 

Assumption  4:  (i)  The  input  signal  and  disturbance  signals 
are  zero  mean  bounded  wide  sense  stationary,  (ii)  The  input 
and  disturbance  signals  are  uncorrelated  with  each  other  over 
time,  (iii)  The  n  x  n  input  signal  covariance  matrix  R  is  pos¬ 
itive  definite,  (iv)  The  input  signal  is  uncorrelated  over  time 
(‘white’). 

III.  Active  Tap  Detection 

The  aim  is  to  determine  the  positions  of  the  m  nonzero 
elements  of  6 .  The  approach  taken  is  to  minimize  the 

Least  Squares  cost  function  V}v(0(iV))  =  (y(fc)  — 

U(k)T6(N))2]/N  under  the  restriction  that  all  but  m  elements 

1Thc  authors  wish  to  acknowledge  the  funding  of  the  activit¬ 
ies  of  the  Cooperative  Research  Centre  for  Robust  and  Adaptive 
Systems  by  the  Australian  Government  under  the  Cooperative  Re¬ 
search  Centres  Program 


of  6  are  zero.  In  general,  this  requires  the  calculation  and 
comparison  of  Vn(9(N))  for  (n)!/[(m)!(n  —  m)!]  different  tap 
combinations.  For  signals  u(k)  and  s(k )  which  satisfy  the  as¬ 
sumptions  above,  we  can  show  that,  for  sufficiently  large  A, 
the  LS  cost  function  Vn(Q(N))  can  be  approximated  by  a  cost 
function  in  which  the  contribution  of  each  tap  is  decoupled 
from  the  rest.  This  leads  to  the  following  result. 

Result  1  Subject  to  the  validity  of  the  assumptions  1-4,  then, 
for  sufficiently  large  A,  the  positions  of  the  m  most  active  taps 
of  the  FIR  modelled  unknown  channel  are  given  by  the  indices 
corresponding  to  the  m  greatest  values  of: 

XrrU)  =  EtLj+1  y(k M*  -  flmcr-i+i  “a(*  “  •?)]• 

To  enable  a  tap  of  the  unknown  channel  to  be  classed  as 
‘active’  or  ‘inactive’,  rather  than  just  more  or  less  active  re¬ 
quires  a  threshold  to  be  developed  for  the  tap  activity  measure 
X/v(j).  This  is  achieved  by  considering  a  structurally  consist¬ 
ent  version  of  the  LS  cost  function: 

Wn((6(N))  =  Vn(9(N))  +  Cmlog(N)/N, 

where  m  is  the  number  of  active  taps  to  be  determined,  C 
is  a  constant  independent  of  m}N.  Through  an  extension  of 
Result  1  above  and  application  of  work  by  Donoho  cited  in 
[l],  we  obtained  the  following  result. 

Result  2  Subject  to  the  validity  of  the  Assumptions  1-4,  then, 
for  sufficiently  large  N ,  the  positions  of  the  active  taps  of  the 
FIR  modelled  unknown  channel  are  given  by  the  indices  j  for 
which:  Xw(j)  >  er^logN ,  where  cr2  is  the  variance  of  y(k). 

Simulations  demonstrate  that  this  tap  activity  criterion 
leads  to  fast  detection  of  the  active  taps  of  the  unknown  chan¬ 
nel 

IV.  LMS  Estimation  Via  Detection 

•  Determine  at  each  sample  interval  k  the  indices  which  sat¬ 
isfy  the  active  condition  Xj(k)  >  a2  (fc)  log(fc),  where  o-2[k)  is 
an  estimate  of  a2. 

•  For  sample  interval  k  (i)  LMS  update  those  taps  in  the 
LMS  estimator  which  correspond  to  the  active  tap  indices  (of 
sample  interval  k)\  (ii)  apply  an  exponentially  decaying  (for¬ 
getting)  function  to  the  remaining  taps  (corresponding  to  the 
inactive  tap  indices  of  sample  interval  /c). 

This  structural  detection  LMS  algorithm  can  be  easily  modi¬ 
fied  to  obtain  an  NLMS  version. 

Simulations  demonstrate  that  the  structural  detection 
LMS/NLMS  algorithms  provide  considerably  better  asymp¬ 
totic/transient  performance,  respectively,  than  the  standard 
LMS/NLMS  algorithms  in  which  full  parametrization  is  used. 
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Abstract  —  We  show  that  the  family  of  maximal  fixed- 
cost  codes,  with  codeword  costs  defined  in  a  right- 
cancellative  semigroup,  have  biproper  trellis  presen¬ 
tations.  Examples  of  maximal  fixed-cost  codes  include 
such  “nonlinear”  codes  as  permutation  codes,  shells 
of  constant  Euclidean  norm  in  the  integer  lattice,  and 
of  course  ordinary  linear  codes  over  a  finite  field.  The 
intersection  of  two  codes  having  biproper  trellis  pre¬ 
sentations  is  another  code  with  a  biproper  trellis  pre¬ 
sentation;  therefore  “nonlinear”  codes  such  as  lattice 
shells  or  words  of  constant  weight  in  a  linear  code 
have  biproper  trellis  presentations. 

I.  Biproper  Trellis  Presentations 

A  proper  trellis  presentation  for  a  block  code  C  is  one  in  which 
the  set  of  edges  emanating  from  any  trellis  vertex  are  labelled 
distinctly  [l].  A  “bi  proper”  trellis  is  a  proper  trellis  that  re¬ 
mains  proper  when  the  direction  of  all  trellis  edges  is  reversed. 
Although  not  all  codes  have  biproper  trellis  presentations  [2], 
when  a  code  has  a  biproper  trellis  presentation,  the  fortu¬ 
nate  circumstance  arises  in  which  the  biproper  trellis  simul¬ 
taneously  minimizes  the  vertex  count  at  each  time  index,  the 
trellis  presentation  is  unique  (up  to  relabeling),  the  trellis  is 
one-to-one,  and  subtrellises  are  also  biproper.  In  the  language 
of  dynamical  systems  theory  [3],  codes  with  biproper  trellises 
have  a  well-defined  and  unique  minimal  state  realization. 

Let  C  be  a  code  of  length  n,  i.e.,  a  set  of  n-tuples.  The 
following  conditions  are  equivalent: 

1.  C  has  a  biproper  trellis  presentation. 

2.  C  forms  a  rectangular  relation  [2]  at  each  time  index. 

3.  For  fixed  partition  (— ,  — )  of  codewords  into  “past”  and 
“future”  [3],  if  (a,d)  £  C  and  (a,  c)  £  C  and  (6,  c)  £ 
C  implies  (6,  d)  £  C  for  all  possible  (not  necessarily 
distinct)  choices  of  a ,  b ,  c,  d . 

4.  Any  of  six  further  equivalent  conditions  of  Willems  [4] 
as  paraphrased  by  Forney  and  Trott  [3,  p.  1500]. 

Example .  Let  C  be  a  group  code  regarded  as  a  block 
code  of  length  two,  in  which  the  coordinate  p  (resp.,  /) 
of  a  codeword  (p,  /)  represents  the  entire  past  (resp.,  fu¬ 
ture)  of  that  codeword.  Suppose  c\  =  (a,d),  C2  =  (a,c), 
and  C3  =  (6,  c)  are  codewords.  The  combination  cic^cz  = 
(a,  d)(a-1 ,  c“1)(6,  c)  =  (b,d)  must  also  be  a  codeword,  hence 
the  code  satisfies  condition  (3).  Since  the  split  into  past  and 
future  can  be  done  at  an  arbitrary  time  index,  the  code  has  a 
unique  minimal  state  realization  at  each  time  index. 

The  purpose  of  this  paper  is  to  introduce  a  wider  class  of  codes 
that  also  satisfy  the  equivalent  conditions  listed  above. 

II.  Maximal  Fixed-Cost  Codes 

Let  S  be  a  right-cancellative  semigroup,  i.e.,  a  semigroup  in 
which  ax  =  bx  implies  a  =  b  for  all  a,  6,  x  £  S.  Let  A  be  a 


product  A\  x  A2  x  •  •  •  x  An  of  symbol  alphabets,  and  define 
“cost  functions”  pi  :  Ai  — » ►  S  that  associate  an  element  of  S 
with  each  symbol  ai  £  A*.  Define  the  “cost”  p(a.)  of  an  n- 
tuple  a  =  (ai, . . . ,  an)  £  A  as  the  product  (in  S)  of  coordinate 
costs,  i.e., 

p(ai,a2, . . .  ,an)  =  pi(ai  )p2(a2)  •  *  *  tin{an). 

Similarly,  for  a  fixed  partition  of  an  element  of  A  into  past 
p  =  (ai , . . . ,  ai )  and  future  /  =  (a,^ , . . . ,  an),  define  p(p)  = 
/*i(ai)  —  and  p(f)  =  pl+1  (a;+1 )  •  •  •  p„(a„). 

For  a  fixed  cost  s  £  5,  define  the  maximal  fixed-cost  code 
Ms  —  {a  £  A  :  //(a)  =  s}  to  be  the  set  of  all  possible  n-tuples 
from  A  having  cost  s. 

Theorem .  Ms  has  a  biproper  trellis  presentation. 

Proof:  For  fixed  partition  of  codewords  into  past  and  fu¬ 
ture,  let  (a,d),  (a,c),  and  (fc,  c)  be  codewords  in  Ms.  Then 
p(a)p(c)  =  p(b)p(c).  By  right-cancellation,  p(a)  =  p(b); 
therefore,  p(a)p(d)  =  p(b)p(d)  =  s.  Since  p(b,d)  —  s,  and 
Ms  is  maximal,  (6,  d)  is  a  codeword.  Thus  Ms  satisfies  condi¬ 
tion  (3). 

III.  Examples 

1.  Let  Ai  —  GF(q))  let  H  ~  [hi,...7hn]  be  an  r  x  n 
matrix  with  columns  hi  having  entries  from  GF(q ),  and  let 
S  =  GF(q)r  be  the  . vector  space  of  r-tuples  over  GF(q).  For 
1  <  2  <  define  pt(x)  —  x  •  hi.  Then  ^(a)  =  a HT ,  and  Mq 
is  the  linear  code  with  parity-check  matrix  H. 

2.  Let  At  =  {0,  l},  let  S  =  No  be  the  monoid  of  non-negative 
integers  under  addition,  and  define  pt  by  pi(0)  —  0;  ^t(l)  =  1. 
Then,  for  any  block  length  n,  Mw  is  the  set  of  binary  n-tuples 
of  Hamming  weight  w. 

3.  Let  Ai  =  Z,  let  S  =  No,  and  define  pi  by  pi(x)  =  x2 . 
Then,  for  any  block  length  n,  Mw  is  the  set  of  integer  n-tuples 
of  squared  norm  w ,  i.e.,  a  shell  in  the  integer  lattice  Zn. 

4.  Let  Ax  —  {ai , . . . ,  ac},  let  S  =  Nq,  the  direct  product  of  c 
copies  of  No-  Define  pi  by  pt{ai)  =  e,  =  (0, . . . ,  0, 1,  0, . . . ,  0) 
where  et  is  the  unit  vector  nonzero  only  in  coordinate  i.  Then 
jR(mi,...,mc)  is  the  permutation  code  obtained  by  permuting 
the  vector  of  “shape”  (a™1 , . . . ,  a™c)  in  all  possible  ways, 
where  a™1  denotes  the  mt-tuple  (a,,  a;, ... ,  a,-). 
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Abstract  —  We  present  a  polynomial-time  algorithm 
which  produces  the  optimal  sectionalization  of  a  given 
trellis  T  for  a  block  code  C  in  time  0(n2),  where  n 
is  the  length  of  C .  The  algorithm  is  developed  in 
a  general  setting  of  certain  operations  and  functions 
defined  on  the  set  of  trellises;  it  therefore  applies  to 
both  linear  and  nonlinear  codes,  and  accommodates 
a  broad  range  of  optimality  criteria.  The  optimality 
criterion  based  on  minimizing  the  number  of  opera¬ 
tions  required  for  trellis  decoding  of  C  is  investigated 
in  detail:  several  methods  for  decoding  a  given  trellis 
are  discussed  and  compared  in  a  number  of  examples. 
Finally,  analysis  of  the  dynamical  properties  of  opti¬ 
mal  sectionalizations  is  presented. 

I.  Introduction 

It  is  now  well-known  [2,  3]  that  every  linear  block  code  may  be 
represented  by  a  trellis,  which  can  be  employed  for  maximum- 
likelihood  decoding  of  the  code  with  the  Viterbi  algorithm  or 
variants  thereof.  The  complexity  of  a  given  trellis  is  usually 
expressed  in  terms  of  parameters  such  as  the  number  of  states 
and/or  branches  it  contains.  While,  indeed,  these  parameters 
govern  the  complexity  of  trellis  decoding,  in  many  cases  this 
complexity  may  be  reduced  with  an  appropriate  sectionaliza¬ 
tion  of  the  trellis.  By  a  sectionalization  we  mean  the  choice  of 
the  symbol  alphabet  at  each  time  index:  for  a  given  order  of 
the  time  axis  X,  the  sectionalization  shrinks  X  at  the  expense  of 
increasing  the  code  alphabet  [2].  A  wide  variety  of  such  gran¬ 
ularity  adjustments  is  possible,  and  each  may  substantially  af¬ 
fect  the  decoding  complexity.  For  a  given  code  C  of  length  n 
and  a  given  order  of  its  time  axis  I,  a  specific  sectionalization 
of  its  trellis  T  is  determined  by  the  set  {ho,  hi, . . .  h„}  C  X 
of  section  boundaries,  where  ho  =  0  <  h i  <  *  *  *  <  =  n. 

Clearly,  there  are  2n~1  possible  ways  to  select  the  section 
boundaries,  and  the  sectionalization  problem  consists  of  find¬ 
ing  the  optimal  choice  among  the  2n_1  possibilities.  Examples 
of  specific  ‘good’  sectionalizations  for  particular  codes  may  be 
found  in  [2,  3]  among  other  works.  However,  at  the  present 
time,  finding  the  best  sectionalization  is  more  akin  to  ‘art’ 
than  to  exact  science:  no  systematic  method  for  finding  the 
optimal  sectionalization  of  a  given  trellis  is  presently  known. 

II.  The  Sectionalization  Algorithm 

In  this  work,  we  present  a  complete  solution  to  the  general  sec¬ 
tionalization  problem.  Namely,  we  describe  a  poly nomial- time 
algorithm  which  produces  the  optimal  sectionalization  from  a 
given  generator  matrix  of  the  code.  The  algorithm  is  devel¬ 
oped  in  a  general  setting  of  certain  operations  and  functions  on 
the  set  of  trellises.  In  particular,  we  generalize  to  some  extent 
the  usual  definition  of  a  trellis.  We  then  define  the  operations 
of  composition  and  amalgamation  of  trellises.  This  enables 
us  to  consider  a  class  of  functions  defined  on  the  set  of  trel¬ 
lises,  that  satisfy  a  certain  linearity  property  with  respect  to 

‘This  work  was  supported  by  the  NSF  Grant  NCR-9501345 


the  composition  operation.  We  then  seek  a  sequence  of  amal¬ 
gamations  and  compositions  that  minimize  the  value  of  an 
arbitrary  given  function  from  this  class.  We  show  that  finding 
such  a  sequence  is  equivalent  to  finding  the  minimum-weight 
path  in  a  certain  weighted  digraph.  This  may  be  accomplished 
using  the  well-known  Dijkstra  algorithm  [1].  Thus,  to  find  the 
sectionalization  of  a  given  trellis  T  which  minimizes  the  value 
of  an  arbitrary  given  function  F(T)t  we  construct  the  corre¬ 
sponding  sectionalization  digraph  Q ,  and  then  apply  a  variant 
of  Dijkstra’s  algorithm  to  Q. 

III.  Examples 

The  general  sectionalization  algorithm  described  above  applies 
to  both  linear  and  nonlinear  codes  and  easily  accommodates 
a  broad  range  of  optimality  criteria.  However,  herein,  we  fo¬ 
cus  on  the  optimality  criterion  based  on  the  total  number  of 
real  additions  and  comparisons  required  for  decoding  the  trel¬ 
lis.  This  criterion  conforms  to  the  well-established  tradition 
of  counting  decoding  complexity  [2,  3].  For  instance,  for  the 
(24, 12,8)  binary  Golay  code  C24,  we  obtain  the  sectionaliza¬ 
tion  with  boundaries  at  0,8,16,24,  which  coincides  with  the 
one  given  by  Forney  [2]  and  proves  that  this  sectionalization 
is  indeed  optimal  for  trellis  decoding  of  C24-  The  number 
of  decoding  operations  we  obtain  for  this  sectionalization  is 
1339,  which  is  slightly  less  than  the  number  reported  by  For¬ 
ney  in  [2],  and  considerably  less  than  the  complexity  of  the 
earlier  algorithms.  Notably,  all  the  previous  Golay  decoders 
have  been  specifically  ‘tailored’  for  C24,  whereas  our  decoder 
is  the  output  of  a  general-purpose  computer  program  which 
applies  uniformly  well  to  any  linear  code.  Other  examples 
include  Reed-Muller  codes,  BCH  codes,  Shearer  codes,  and 
quadratic-residue  codes.  In  particular,  for  all  the  primitive 
BCH  codes  of  length  <  64  we  improve  upon  the  decoding 
complexities  reported  in  [3]. 

IV.  Dynamics  of  Optimal  Sectionalizations 

Although  our  algorithm  readily  produces  the  optimal  section¬ 
alization,  it  provides  little  insight  as  to  how  it  relates  to  the 
dynamical  properties  of  the  code.  Thus,  we  also  investigate, 
under  certain  simplifying  assumptions,  the  dynamics  of  opti¬ 
mal  sectionalizations.  For  instance,  we  show  that  if  the  di¬ 
mensions  of  both  past  and  future  subcodes  change  at  a  given 
position  t  G  X,  then  t  is  necessarily  a  section  boundary  in  the 
optimal  sectionalization.  Furthermore,  in  any  section  of  the 
optimal  sectionalization  the  dimension  of  either  the  past  or 
the  future  or  both  must  be  constant. 
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Abstract  —  We  present  a  new  lower  bound  on  the 
state-complexity  of  linear  codes,  which  includes  all  the 
existing  bounds  as  special  cases.  For  a  large  number 
of  codes  this  results  in  a  considerable  improvement 
upon  the  DLP  bound.  Moreover,  we  generalize  the 
new  bound  to  nonlinear  codes,  and  introduce  several 
alternative  techniques  for  lower  bounding  the  trellis 
complexity,  based  on  the  distance  spectrum  and  other 
combinatorial  properties  of  the  code.  We  also  show 
how  our  techniques  may  be  employed  to  lower  bound 
the  maximum  and  the  total  number  of  branches  in  the 
trellis.  The  asymptotic  behavior  of  the  new  bound  is 
investigated  and  shown  to  improve  upon  the  known 
asymptotic  estimates  of  trellis  complexity. 

I.  Introduction 

The  trellis  state-complexity  of  a  linear  code  C  over  GF (q)  is 
defined  ass  =  max;€x  {log^  \Si  \  },  where  Si  is  the  set  of  states 
at  time  in  the  minimal  trellis  for  C.  Perhaps  the  earliest 
known  lower  bound  on  s  is  due  to  Muder  [4]:  for  an  (n,  k,  d)  lin¬ 
ear  code  s  >  k- min, 6x  {K(i,d)  +  K(n  -  i,  d)},  where  K(n,d) 
is  the  largest  possible  dimension  of  a  linear  code  of  length  n 
and  minimum  distance  d.  This  bound  was  improved  by  several 
authors,  giving  s  >  k  -  min;€x  {£(*’;  C)  +  k(n~i;  C)},  where 
k(t;C)  is  the  dimension-length  profile  (DLP)  of  C,  i.e.,  the 
maximum  dimension  of  any  subcode  of  C  of  support  size  *.  All 
these  bounds  are  based  on  the  common  idea  of  dividing  the 
time  axis  for  the  code  into  two  sections  —  the  past  and  the  fu¬ 
ture,  and  then  bounding  the  dimension  of  the  resulting  state- 
space  using  any  of  the  known  upper  bounds  on  the  dimension 
of  the  past  and  future  subcodes.  In  [3]  we  have  recently  de¬ 
rived  a  conceptually  different  bound  s  >  ffc(d  —  l)/n],  based 
on  dividing  the  time  axis  into  \n/(d—  1)]  sections,  and  using 
the  fact  that  there  can  be  no  parallel  transitions  in  a  trellis 
section  of  length  less  than  d. 

II.  Lower  Bound  on  State  Complexity 

In  this  work  we  present  a  new  lower  bound  on  s,  which  includes 
all  the  existing  bounds  as  special  cases.  The  new  bound  is 
obtained  by  partitioning  the  time  axis  for  C  into  several  - 
that  is,  generally  more  than  two  -  sections  and  then  selecting 
the  partition  which  yields  the  best  lower  bound. 

Theorem  1.  Let  . .  .Il  be  any  set  of  positive  integers , 
with  h  +  l2  H - f  II  =  n.  Then 

s  >  \k-k(h-,C)-k(l2;C) - k(lL;C)~ 

L  —  1 

For  great  many  codes  this  results  in  a  substantial  improve¬ 
ment  upon  the  DLP  bound.  We  have  applied  the  proposed 
technique  to  all  the  8128  best  known  binary  linear  codes  of 
length  <  128,  and  obtained  over  3400  improvements  over  the 
DLP  bounds.  For  a  complete  summary  of  our  results,  send 
e-mail  to  trellisOgolay.csl.uiuc.edu. 
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III.  Bounds  on  Branch  and  Edge  Complexity 

Trellis  branch- complexity  was  defined  by  Forney  in  [1]  as 
6  =  max; ex  6* ,  where  6,  is  the  logarithm  of  the  number  of 
branches  in  the  trellis  section  corresponding  to  time  i  <E  T.  The 
state-complexity  bounds  of  Theorems  1  and  2  can  be  trans¬ 
lated  into  a  lower  bound  on  b,  which  is  often  much  tighter 
than  the  obvious  statement  b  >  s. 

Theorem  3.  Let  1\,12,...Il  be  positive  integers ,  such  that 
h  +  J2  ~f  •  •  •  +  =  n  —  L  -(- 1.  Then 

b  >  ;C)-k(h;C) - k(lL;CY 

L-  1 

The  new  bound  on  branch-complexity  was  again  applied  to  all 
the  best-known  binary  linear  codes  of  length  <  128,  yielding 
over  3300  improvements  over  the  DLP  bound.  Notably,  in 
2621  out  of  the  3300  cases,  the  lower  bound  on  6  is  strictly 
greater  than  the  lower  bound  on  s.  In  addition,  we  derive 
lower  bounds  on  the  total  number  of  branches  in  the  trellis  — 
the  trellis  edge-complexity  E{C)  as  defined  in  [2].  The  bounds 
follow  by  solving  a  nonlinear  integer  programming  problem 
with  linear  constraints,  which  arise  from  the  general  relations 
between  the  values  of  6,  derived  in  the  proof  of  Theorem  3. 

IV.  Asymptotics 

As  shown  in  [3],  for  a  sequence  of  codes  of  increasing  length  n, 
with  rate  fixed  at  R  and  relative  minimum  distance  fixed  at 
d/n  =  S ,  the  state-complexity  is  bounded  by  an  <  s  <  c2n 
for  some  constants  c\  and  C2  independent  of  n.  The  results 
of  [3]  establish  c\  >  SR ,  while  the  work  of  [5]  shows  that 
ci  >  R  —  Rm*x(2S),  where  Rm ax(*)  is  the  function  describing 
the  JPL  upper  bound.  Herein  we  prove 

Theorem  4.  Let  <;  =  s/n.  Then  for  n  4  oo  and  for  all 
L  =  2,3,..., 

s,  R —  Rmax(L<$) 

*  ~  L-  1 

The  theorem  produces  a  countably  infinite  family  of  lower 
bounds  on  and  it  is  easy  to  see  that  the  apparently  dissimilar 
bounds  of  [3]  and  [5]  are  in  fact  the  extreme  members  of  this 
family  corresponding  to  L  =  2  and  L  ~  1  /<$,  respectively. 
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Abstract  —  The  growth  of  trellis  diagrams  of  lattices  ACKNOWLEDGEMENTS 
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that  this  growth  exponentially  in  terms  of  the  coding  and  suggestions, 
gain. 
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relations  between  trellis  complexity,  the  minimum  distance 
and  the  dimension  of  linear  block  codes.  This  work  reports  a 
parallel  development  for  lattices. 

II.  Preliminaries 

Let  L  denote  the  set  of  all  lattices  having  a  finite  trellis  dia¬ 
gram.  Let  L  6  C  and  n  denote  the  dimension  of  L.  Let  C(L) 
denote  the  category  of  all  the  finite  trellis  diagrams  for  L,  then 
C(L)  is  nonempty.  Let  S  and  B  denote  the  minimum  number 
of  states  and  branches,  respectively,  of  elements  of  C(L).  Con¬ 
sider  the  sum  of  the  cardinality  of  the  label  groups  for  each 
element  of  C(L)  and  let  G  denote  the  minimum  of  these  sums 
in  C(L).  Define  S(L),  the  average  state  trellis  complexity  of 
L,  to  be  (5  -  l)/n  and  B(L),  the  average  branch  trellis  com¬ 
plexity  of  L  to  be  5/n.  Define  <?(L),  the  average  label  group 
complexity  of  L  to  be  G/n .  For  any  lattice  L,  let  6(L)  denote 
the  coding  gain  of  L.  Then 

Ti(7)  =  inf{S{L)  |  6(L)  >  7  and  L  G  £}, 

T2(t)  =  inf{B(L)  |  6{L)  >  7  and  L  G  £}, 

T3(7)  =  inf{Q{L)  |  6(L)  >  7  and  L  G  C }, 

referred  to  as  the  state  trellis  complexity ,  the  branch  trellis 

complexity  and  the  label  trellis  complexity  functions  respec¬ 
tively. 

Since  %  ,z  =  1,2,3,  represent  the  best  trade-off  between 
trellis  complexity  and  gain,  it  is  essential  to  establish  bounds 
on  the  behavior  of  these  functions. 

In  [1]  these  results  are  established. 

1:  Ti  =  1  4-  Ciln( 7)  for  Ci ,  i  =  1,2,3  constants,  whenever 
the  coding  gain  is  close  to  1. 

2:  Ti  and  T2  grow  exponentially  when  7  is  large. 

3:  Tz  grows  at  most  linearly. 

4:  Ti(7)  >  (7/7 r)r/2,r  =  1,2,3,...,  where  7r  denotes  the 
coding  gain  of  the  densest  lattice  in  r  dimensions. 

5-  T2(t)  >  -fir+1)/2hr/2,r  =  1, 2, 3, . . . 

6-  Ti  is  bigger  than  any  of  the  DLP  bounds  evaluated  at 

7  ■ 

7-  A  random  coding  argument  was  then  applied  to  show 
that  the  bounds  given  above  cannot  be  much  improved. 

The  above  results  imply  that  the  Viterbi  algorithm,  applied 
to  the  trellis  diagrams  of  lattices,  have  exponential  running 
time. _ 

^This  work  was  supported  by  the  National  Sciences  and  Engi¬ 
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Abstract  —  In  this  paper  we  present  a  systematic 
method  for  combining  rotational  invariance  and  punc¬ 
tured  codes  for  use  with  TCM,  and  discuss  a  new  per¬ 
spective  on  the  class  of  punctured  codes. 

I.  Introduction 

Trellis  coded  modulation  (TCM)  based  QAM  systems,  such 
as  those  found  in  the  V.32  and  V.34  telephone  modem  stan¬ 
dards,  incorporate  rotational  invariance  to  provide  benefits 
with  respect  to  absolute  phase  reference  and  phase  noise  pro¬ 
tection.  In  binary  convolutional  code  based  QPSK  systems, 
the  computational  savings  and  code  rate  flexibility  of  punc¬ 
tured  coding  is  well  known  and  used  to  great  advantage.  How¬ 
ever,  these  systems  do  not  maintain  rotational  invariance  and 
suffer  from  the  lack  of  this  property. 

Until  recently,  a  systematic  method  of  combining  both  rota¬ 
tional  invariance  and  puncturing  in  a  general  framework  was 
unknown.  In  this  paper  we  present  a  rotationally  invariant 
encoding/uncoding  structure  that  can  use  punctured  codes. 
Parts  of  this  work  are  related  to  [4,  5]. 

II.  Background 

Rotationally  invariant  (RI)  trellis  codes  are  important 
whenever  the  modulation  signal  set  has  a  two-dimensional  ro¬ 
tational  symmetry  and  the  transmission  system  can  introduce 
a  phase  ambiguity  at  the  receiver.  A  trellis  code  is  RI  if  the 
componentwise  rotation  of  a  code  sequence  is  always  another 
code  sequence  in  the  code  (cf.,  [1]).  RI  trellis  codes  with  RI 
encoders/uncoders  are  highly  desirable  as  a  method  of  han¬ 
dling  90°  phase  ambiguities  as  they  have  the  property  that 
the  output  of  the  uncoder  for  any  codeword  is  the  same  as  the 
output  when  the  codeword  is  first  rotated  by  0°,  90°,  180°  or 
270°  before  being  presented  to  the  uncoder. 

A  punctured  convolutional  code  is  a  high-rate  code  ob¬ 
tained  from  a  lower-rate  code  by  periodically  eliminating,  i.e., 
“puncturing”  specific  symbols  from  the  lower-rate  codeword 
[2,  3].  The  resulting  punctured  code  depends  on  both  the  orig¬ 
inal  code,  and  on  the  number  and  locations  of  the  symbols  to 
be  deleted. 

III.  Trellis  Coding 

This  work  describes  a  method  of  encoding,  using  any  trans¬ 
parent  binary  convolutional  code  (BCC),  that  results  in  a  RI 
trellis  code  for  applications  to  QPSK  and  QAM  modulation. 
This  method  incorporates  three  components:  (1)  a  transpar¬ 
ent  BCC  (2)  a  2  dimensional  signal  space  labeling  and  (3)  a 
precoding/postcoding  function. 

Transparent  codes  and  transparent  encoders/uncoders  are 
highly  desirable  as  a  method  of  handling  180°  phase  ambigui¬ 
ties.  A  BCC  is  said  to  be  transparent  if  the  compliment  of  any 

1This  work  was  supported  in  part  by  NSF  grant  NCR-9207331 
and  by  the  United  States  Army  Research  Office  through  the  Army 
Center  of  Excellence  for  Symbolic  Methods  in  Algorithmic  Mathe¬ 
matics  (ACSyAM),  Mathematical  Sciences  Institute  of  Cornell  Uni¬ 
versity,  Contract  DAAL03-91-C-0027. 

2  erl4@cornell.edu,  rowe@ee.cornell.edu,  heegard@ee.cornell.edu 


codeword  is  always  a  codeword.  Every  transparent  code  has 
a  transparent  encoder /uncoder  with  the  property  that  even  if 
the  codeword  is  complemented  the  uncoder  will  produce  the 
correct  sequence. 

The  QPSK  and  QAM  signal  sets  need  to  be  labeled  in  such 
a  way  that:  (1)  the  two  least  significant  bits,  (Jj,  Q,-),  satisfy 
(Ij ,  Qj)  — ►  {Qji  Ij)  under  90°  rotation  and  (2)  the  remaining 
most  significant  bits  are  invariant  to  90°  rotation. 

The  following  mapping  describes  the  required  precoder/ 
postcoder  structure.  Precoder  equations: 

Xj  =  Wj  +  Xj- 1  +  +  yj-i), 

Vi  -  zj  +  Wj  +  yj- 1  +  zj(xj- 1  +  i/j-i). 

Postcoder  equations: 

Wj  =  Xj  +  yj-i  +  (xj  +  yj)(xj-i  +  yj-i), 

Zj  yj  -j-  Xj  H~  yj — i  +  Xj — i . 

Note  that:  (1)  the  postcoder  inverts  the  precoder,  (2)  the 
output  of  the  postcoder  is  the  same  under  the  map  (xj ,  yj )  — * 
( Vj ,  Xj)  (or  any  integer  power  of  this  map),  (3)  the  postcoder 
function  is  feedback  free  and  thus  limits  error  propagation. 

In  the  encoder,  the  two  binary  outputs  of  the  precoder 
are  independently  encoded  with  separate  transparent  BCC 
encoders.  The  BCC  outputs,  along  with  the  remaining  un¬ 
coded  (or  parallel  edge)  information,  are  combined  to  select 
the  QAM  constellation  point  to  be  transmitted.  The  mapping 
is  such  that  (1)  the  BCC  outputs  independently  select  the  LSB 
of  the  I  and  Q  components  and  (2)  parallel  edge  information 
is  RI. 

By  using  the  above  encoding  method  with  transparent 
punctured  BCCs,  the  resulting  structure  is  a  punctured,  rota¬ 
tionally  invariant  trellis  code.  The  cost  of  this  technique  is  a 
doubling  of  data  memory  in  the  viterbi  decoder. 

We  will  also  present  an  alternative  view  of  punctured  codes 
from  the  perspective  of  multirate  systems. 
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Abstract  —  This  summary  outlines  certain  results 
about  trellis  structures  of  linear  block  codes  that 
achieve  the  highest  speed  of  decoding  while  satisfying 
a  constraint  on  the  structural  complexity  of  the  trel¬ 
lis  in  terms  of  the  maximum  number  of  states  at  any 
particular  depth.  An  upper  bound  on  the  number  of 
parallel  isomorphic  subtrellises  in  a  proper  trellis  for  a 
code  without  exceeding  the  maximum  state  space  di¬ 
mension  of  the  minimal  trellis  of  the  code  is  derived. 
The  complexity  of  VLSI  implementation  of  a  Viterbi 
decoder  based  on  an  L-section  trellis  diagram  for  a 
code  is  analyzed  and  certain  descriptive  parameters 
are  introduced.  It  is  shown  that  a  VLSI  chip  Viterbi 
decoder  based  on  a  non-minimal  trellis  requires  less 
area  and  is  capable  of  operation  at  higher  speed  than 
one  based  on  the  minimal  trellis  when  the  commonly 
used  ACS-array  architecture  is  considered. 

I.  Introduction 

Much  effort  has  been  expended  on  minimizing  the  num¬ 
ber  of  states  in  a  trellis  for  a  block  code  by  considering  all 
possible  permutations  of  the  bits  of  the  code  and  also  on  min¬ 
imizing  the  number  of  operations  required  to  decode  a  received 
vector  using  a  trellis  for  the  code.  If  decoding  is  performed 
using  a  stored  program  that  is  executed  sequentially,  then  this 
approach  will  lead  to  the  fastest  speed  of  decoding.  However, 
if  decoding  is  performed  using  a  VLSI  chip,  then  the  above 
approach  fails  and  an  alternative  approach  is  more  suitable. 
Given  a  constraint  on  the  amount  of  hardware  (determined 
by  the  number  of  states  and  the  complexity  of  branches)  in 
the  decoder,  decoding  must  be  done  as  fast  as  possible; 
not  with  as  few  computations  as  possible.  In  [3],  we 
have  derived  properties  of  the  structure  of  this  non-minimal 
trellis  which  show  that  a  non-minimal  trellis  implementa¬ 
tion  requires  less  area  in  the  VLSI  chip  than  the  minimal 
trellis  implementation  when  the  prominent  ACS-array  archi¬ 
tecture  is  assumed  [2]. 

II.  Constrained  Parallelism 

We  show  how  to  build  a  trellis  for  a  linear  block  code 
which  is  a  disjoint  union  of  certain  desired  number  of  parallel 
isomorphic  subtrellises.  Although  this  trellis  is  not  minimal , 
its  state  space  dimension  at  every  depth  is  less  than  or  equal  to 
the  maximum  state  space  dimension  of  the  minimal  trellis.  Let 
{so,  sm,  • .  • ,  slm}  denote  the  state  dimension  profile  (SDP)  of 
the  L-section,  M  bits/section  minimal  trellis  of  a  (TV,  A  )  linear 
block  code  C  and  smax,L(C)  be  the  largest  among  them.  Let 
G  be  the  trellis  oriented  generator  matrix  of  an  (TV,  A')  linear 
block  code  C  [l].  Let  r  =  (n,  r2, .  . . ,  tn)  be  a  typical  row  of 
G.  Then,  we  define  the  span  of  r,  denoted  span{ r),  to  be 


the  smallest  interval  [i,  j],  1  <  a  <  j  <  TV  which  contains  all 
the  non-zero  elements  of  r.  For  a  row  r  whose  span  is  [i,j]  we 
also  define  an  active  span  of  r,  denoted  aspan{ r),  as  — 
if  i  <  j  and  aspan(r)  =  (j)  if  i  =  j.  Define  the  non-empty  set, 

Anax(C)  —  {/  :  5/(C)-Smax>L(C)}  (1) 

Let  R{C)  be  the  following  subset  of  rows  of  G, 

R( C)  =  {r  £  G  :  aspan( r)  D  Im ax(C)}  (2) 

Let  d  =  |A(C)|  where  \Q\  denotes  the  cardinality  of  any  finite 
set  Q. 

Theorem  1:  With  A(C)  defined  as  above  and  d  —  |A(C)|, 
let  1  <  d'  <  d.  There  exists  a  subcode  C '  of  C  such  that 
Smax, l(C')  =  Srnax,L (C)  -  d!  and  dim(C')  =  dim  (C)  -  d! 
if  and  only  if  there  exists  a  subset  R!  C  A(C)  consisting 
of  d'  rows  of  R(C)  such  that  for  every  /  satisfying  si(C)  > 
5max,L(C/),  there  exist  at  least  sj(C)  -  smax,i,(C')  rows  in  R' 
whose  active  spans  contain  1.  The  set  of  coset  representatives 
[C/C']  is  generated  by  R1 . 

The  utility  of  the  above  theorem  is  that  it 
shows  how  to  choose  a  subcode  C'  of  C  with  5max,L  (C')  = 
smax,L (C)  —  dim  ([C/C']),  such  that  a  non-minimal  trellis  T 
for  C  with  maximum  state  space  dimension  smax,L(C)  and 

which  is  the  union  of  2d  >-  !c'c1  parallel  subtrellises  T,  each 
isomorphic  to  the  minimal  trellis  for  C '  can  be  built.  Up¬ 
per  Bound  on  Parallelism:  The  smallest  such  subcode  has 
dimension  lower  bounded  by  dim  (C)  —  |A(C)|.  i.e.,  the  max¬ 
imum  number  of  parallel  subtrellises  one  can  obtain  with  the 
constraint  that  the  total  state  space  dimension  never  exceed 
Smax(C)  is  upper  bounded  by  2 ^ ^ ^  with  A(C)  as  defined 
above.  Parallelism  of  the  Minimal  Trellis:  The  logarithm 
to  the  base  2  of  the  number  of  parallel  isomorphic  subtrel¬ 
lises  in  a  minimal  L-section  trellis  for  a  binary  (TV,  A”)  linear 
block  code  is  given  by  the  number  of  rows  in  its  trellis  ori¬ 
ented  generator  matrix  whose  active  span  contain  the  integers 
{ M ,  2 M, . . . ,  (L  —  1  )M}  where  TV  =  LM . 
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Abstract  —  A  class  of  methods  for  soft-decision  decoding 
of  linear  block  codes,  referred  to  as  Reconfigurable  Trellis  (RT) 
decoding,  is  presented.  In  RT  decoding  a  reduced  trellis  (or  tree) 
search  is  facilitated  by  carrying  out  the  search  on  a  reconfigured 
trellis  (or  tree)  that  corresponds  to  an  equivalent  code.  The  equiv¬ 
alent  code  is  formed  by  reordering  the  received  symbols  according 
to  their  reliabilities.  Consequently,  the  trellis  reconfiguration  is  de¬ 
termined  ‘on-the-fly\  but  only  a  small  portion  of  the  trellis  needs 
to  be  constructed,  as  guided  by  the  reduced  search.  The  search 
efficiency  improves  for  channels  where  the  soft-decisions  provide 
a  good  indication  of  which  symbols  are  in  error.  For  example,  us¬ 
ing  the  M  algorithm  on  an  erasure  channel,  only  a  single  survivor 
(i.e.  M  =  1)  is  sufficient  to  attain  maximum-likelihood  decod¬ 
ing  of  maximum-distance  codes.  For  more  typical  channels,  we 
present  simulation  results  and  a  detailed  assessment  of  the  number 
of  metric  and  binary-vector  operations  for  the  M  algorithm. 
Summary 

We  discuss  a  class  of  methods  for  soft-decision  decoding  of 
linear  block  codes  that  utilize  reconfigured  trellises  (or  trees).  By 
a  reconfigured  trellis,  we  mean  that  the  trellis  used  for  decoding 
corresponds  to  an  equivalent  code  obtained  by  a  reordering  of  the 
symbol  positions  to  exploit  their  differing  reliabilities.  Some  recent 
works  also  utilize  reduced  searches  on  specially  generated  trellises 
or  trees  [1][2][3]  (see  also  [4]).  Of  these  works,  [1][2]  do  not 
reconfigure  the  code  trellis.  The  work  reported  herein,  while  it 
uses  a  similar  trellis  reconfiguration  to  that  of  [3],  was  developed 
independently  [5][6].  Both  [2]  and  [3]  focus  on  ML  decoding, 
while  here  we  emphasize  that  trellis  reconfiguration  may  facilitate 
many  types  of  reduced  searches,  and  concentrate  in  particular  on 
the  M  algorithm. 

The  number  of  branches  explored  during  reduced  searches  of 
trellises  is  decreased  by  exploring  paths  that  are  most  likely  to  be 
part  of  the  maximum-likelihood  path  (MLP),  while  discarding  those 
paths  that  are  unlikely  to  belong  to  the  MLP  as  early  in  the  search 
as  possible.  The  key  observation  is  that  few  branches  would  need 
to  be  explored  if  the  rank  order  of  path  metrics  rapidly  converged 
with  depth  to  their  final  values.  In  other  words,  a  reduced  search 
algorithm  could  stop  any  further  exploration  of  a  path  relatively 
early  in  the  search ,  without  losing  the  MLP,  if  the  influence  of  the 
unexplored  branch  metrics  on  the  rank  order  of  the  path  metrics  was 
insignificant.  Since  reliable  symbol-positions  have  one  symbol- 
hypothesis  that  is  much  more  likely  than  its  alternatives,  and 
since  unreliable  symbol-positions  have  little  distinction  between 
alternative  symbol-hypotheses,  reconfiguring  the  symbol  positions 
in  a  most  reliable  symbol  first  (MRSF)  manner  should  increase 
the  rate  of  convergence  with  depth  of  the  rank  order  of  path 
metrics.  In  other  words,  using  MRSF  ordering  should  enable  the 
path  exploration  to  rapidly  gather  and  utilize  the  most  significant 
branch-likelihood  information,  regardless  of  the  type  of  reduced 
search  being  used. 
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RT  decoding  will  tend  to  collect  channel  errors  into  a  burst  in 
the  later  depths  (tail)  of  the  trellis,  thus  ‘trapping’  many  errors.  That 
is,  many  errors  can  fall  in  the  parity  symbol  positions  in  the  tail, 
and  for  such  positions  there  is  only  one  branch  leaving  each  node, 
which  constrains  the  search  while  conveniently  ignoring  the  errors. 
The  number  of  errors  that  will  be  trapped  by  RT  decoding  depends 
on  how  accurately  the  soft-decisions  indicate  the  error  positions. 
For  example,  if  we  are  fortunate  enough  to  have  a  channel  that 
is  extremely  well  approximated  by  an  erasure  channel,  then  RT 
decoding  will  collect  all  erasures  in  the  tail.  In  the  case  of  a 
maximum-distance  code  on  an  erasure  channel,  if  there  are  n  -  k 
or  fewer  erasures  they  will  all  be  in  the  final  n  -  k  positions  of 
the  reconfigured  trellis,  and  these  final  n  —  k  =  dmin  -  1  positions 
will  hold  parity  symbols.  This  leaves  a  correct  information  set  from 
which  the  correct  codeword  will  be  formed,  thus  ensuring  that  ML 
decoding  can  be  attained  by  retaining  only  one  survivor ,  regardless 
of  the  size  of  the  code.  Similarly,  for  non-maximum-distance 
codes,  we  can  be  assured  to  correct  any  pattern  of  dmm  —  1  errors 
with  only  one  survivor. 

We  focus  on  suboptimal  soft-decision  decoding  to  explore  the 
trade-off  between  the  coding  gain  attained  and  the  computational 
effort  expended.  A  ‘near-ML’  decoder  can  be  very  efficient  while 
having  a  loss  in  decoding  performance  that  is  negligible  in  practice. 
For  example,  the  extended  Golay  (24,12)  code  on  an  AWGN 
channel  can  be  RT  decoded,  using  the  M  algorithm,  to  within  0.25 
dB  of  ML  decoding  with  only  8  survivors.  Tables  are  presented 
that  summarize  the  number  of  metric  and  binary-vector  operations 
for  the  M-algorithm. 

Finally,  we  comment  that  some  other  important  factors  con¬ 
tribute  to  the  efficiency  of  RT  decoding.  First,  the  reduced  trellis 
exploration  is  facilitated  by  the  use  of  a  simplified  trellis  construc¬ 
tion  method  [7],  Second,  the  efficiency  of  the  RT-M  algorithm  is 
enhanced  by  the  use  of  a  survivor  selection  method  that  operates 
in  linear-time  [8],  needing  at  most  M  comparisons  at  each  depth. 
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Abstract  -  The  structure  of  the  twisted  squaring  construction 
is  studied.  We  focus  on  the  subclass  of  symmetric-reversible 
codes  and  show  that  it  includes  the  extended  primitive  BCH 
codes.  New  results  on  the  trellis  complexity  of  these 
constructions,  and  the  BCH  codes  in  particular,  are  derived. 

I.  Introduction 

Trellis  diagrams  are  primarily  used  for  efficient  soft-decision 
decoding  [l]-[3].  The  structure  of  the  codes  is  a  fundamental  key 
for  investigating  the  associated  trellis  diagrams  [2],[4]-[7].  The 
squaring  construction  (SC)  was  employed  by  Forney  [2]  to  derive 
trellis-oriented  designs,  particularly  applied  for  RM  codes  and 
Barnes-Wall  lattices.  We  are  interested  in  the  twisted  squaring 
construction  (TSC),  a  generalization  of  the  SC  [2].  The 
Nordstrom-Robinson  code  and  a  related  packing  are  known 
examples  of  nonlinear  TSC  [2].  We  classify  several  families  of  the 
TSC  and  focus  on  the  symmetric-reversible  codes.  We  show  that 
they  include  the  extended  primitive  BCH  codes.  The  constructions 
are  characterized  and  new  results  on  the  related  trellis  diagrams 
are  developed. 

II.  The  Twisted  Squaring  Construction 
We  follow  the  notations  of  [2].  Let  SIT  denote  the  partition  of  a 
discrete  set  S  into  M  =  \S  /  T[  disjoint  subsets  Tx  ,  i  = 
The  minimum  distance  d(S)  is  defined  as  the 
minimum  nonzero  distance  d(s\,s2)  associated  with  any  pair 
(si,s2)  eS  .  We  also  define  d(T)  as  the  minimum  d{Tx)  among  the 
subsets  of  S .  Let  7J2  denote  the  set  of  all  pairs  (5i,52)  where 
S\  ,s2  eTi  .  The  SC  is  the  union  U  of  the  M  sets  7J2  ,  and  d(U)  = 
m\n{d(T),2d(S)}  [2],  Let  C(n,k)  denote  a  linear  code  over  GF(g) 
with  length  n  and  dimension  k.  Let  D  be  a  subcode  of  C.  The  SC 
is  labeled  by  |C/D|2.  It  consists  of  codewords  {d\  +  b,d2  +b)  , 
where  dud2eD  and  b  belongs  to  the  space  B-[CID]  of 
cosets  representatives  of  D  in  C.  The  TSC  is  the  union  W  of  M 
sets  TiTj ,  where  i  and  j  cover  all  values  between  0  and  M.  The 
lower  bound  d(W)>  min{d(T),2d(S)}  [2]  suggests  an  improvement 
of  the  TSC  over  the  SC.  The  TSC  in  terms  of  linear  codes  will  be 
labeled  by  ||C/D||2  .  It  consists  of  codewords  {d\  +b,d2  +&’)  , 
where  b  and  b '  run  through  all  elements  of  B.  Let  Gc  and  Go 
denote  the  generator  matrices  of  C  and  D,  respectively.  The 
generator  matrix  of  || Cl  D|]2  is  equivalent  to 
Gc 

0  Go)  ' 

where  Gc  is  obtained  from  Gc  by  elementary  row  operations. 

III.  Symmetric-Reversible  and  Related  Codes 

A  code  A  is  called  symmetric  if  (a]ya2)eA  implies  that 
{u2 , a\ )  e  A  .  We  show  that  any  symmetric  code  is  a  TSC,  and 
Gc  =  EGC  such  that  E  is  invertible  and  E 2  is  equivalent  to  the 
identity  matrix.  A  code  is  called  reversible  if  it  contains  the 
reversed  version  of  every  codeword.  A  symmetric-reversible  (SR) 
code  is  hereby  defined  as  a  code  that  is  both  symmetric  and 
reversible.  We  show  that  part  of  the  above  properties  are  inherited 


to  the  subcodes  C  and  D.  A  code  is  called  affine-invariant  (AI)  if 
it  is  invariant  under  the  affine  permutation.  This  class  includes  the 
Reed-Muller  (RM)  and  extended  primitive  BCH  codes.  We  prove 
that  AI  codes  are  iterated  SR  codes  (and  obviously  iterated  TSC), 
i.e.,  the  subcodes  C  and  D  are  also  SR  codes.  We  characterize  the 
constructions  and  show  that  the  dual  TSC  and  dual  SR  designs 
are,  respectively,  TSC  and  SR  designs. 

IV.  Trellis  Complexity 

A  general  description  of  trellis  diagrams  of  block  codes  is  given 
in  [1],[2].  For  a  given  coordinate  ordering,  the  minimal  trellis  size 
s  is  defined  as  the  maximal  state-space  dimension  of  the  minimal 
trellis  diagram  [2].  The  minimal  5  over  any  permutation  of  a  code 
A  is  labeled  by  s(/4).  The  general  Wolfhound  is  s(/4)  <  min  {k,n-k} 
[1].  Let  A  be  a  TSC  code  ||C7  D||2  .  A  simple  recurrence  formula 
for  the  trellis  complexity  is  given  by 

^(/l)  <  s(D)  +  dim(Q  -  dim(Z)). 

Improved  bounds  are  derived  for  iterated  SR  codes  such  as 
primitive  BCH  codes.  Upper  bounds  on  the  decoding  complexity 
are  thereby  implied  in  conjunction  with  results  of  [3].  Some 
examples  of  binary  primitive  BCH  codes  are  given  in  Table  I. 
Actual  5  parameters  were  numerically  obtained  using  computer 
program  (see  also  [5], [6]). 


Table  I 

Upper  bounds  on  s(/4)  for  primitive  BCH  codes 


Code 

C 

D 

Wolf  Bound 

New  Bound 

s 

(16,7,6) 

(8,6) 

(8,1) 

7 

6 

6 

(32,21,6) 

(16,15) 

(16,6) 

11 

10 

10 

(64,45,8) 

(32,29) 

(32,16) 

19 

15 

14 

(64,36,12) 

(32,26) 

(32,10) 

28 

21 

19 

Additional  results  are  developed  utilizing  the  highly  structural 

constructions.  Furthermore,  the  trellis  complexity  of  long  BCH 

codes  may  be  evaluated.  The  constructions  may  also  be  useful  for 

related  applications  such  as  the  generalized  weight  hierarchy  [8]. 
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Abstract  -  All  binary  linear  codes,  satisfing 
the  two-way  chain  condition  with  dimension  up 
to  6  are  described  in  terms  of  their  generator 
matrices.  An  expression  for  their  state  complex¬ 
ity  profile  is  found  also.  Cases,  when  such  codes 
are  Z4  —  linear  are  shown. 

I.  Introduction 

Let  C  be  a  binary  linear  [n,fc,d]  code.  The  support 
of  a  vector  a  —  (ai,  a2, ...,  an)  in  GF( 2)n  is  defined  by 
x(a)  —  {j\aj  ~f~  0}-  The  minimum  support  weight,  dr, 
of  a  code  C  is  the  size  of  the  smallest  support  of  any  r- 
dimensional  subcode  of  C.  In  particular  d\  =  d. 

The  concept  of  the  two-way  chain  condition  was  intro¬ 
duced  by  Forney  [3]. 


The  state  complexity  profile  for  codes  from  Lemma  1 
is 

0iai~~i01a3- I01a5-i0r3-i0iai— xo  for  a2  =  a4  =  0, 

01ai  2a2_1  la30las— i01a32o2~i  iai  0  for  a2  >  1  ,a4  =  0, 
01ai“101a32a4-1las+12a4“1la301ai”10  for  a2  =  0,a4  >  1, 
Q1a12a2-l1a3+l2a4-l1a5+l2a4-l1a3  +  l2a2-l1al0  otherwise. 

The  state  complexity  is  1  if  a4  <  1  and  a2  <  1, 

2  otherwise. 

Lemma  2  Codes  described  in  Lemma  1  are  Z4  -  linear 
if  as  is  an  even  number. 

Acknowledgements 


Definition  1  An  [n,  k]  code  C  satisfies  the  two-way 
chain  condition  if  it  is  equivalent  to  a  code  C  with  the 
following  property:  there  exist  two  chains  of  subcodes  of 
C,  the  left  chain  D\  C  D\  c  ■  •  •  C  D\  —  C ,  and  the 
right  chain  Df*  C  Df  C  •  •  *  C  Df  =  C,  where,  for 
1  <  r  <  k,  we  have  dim(D^)  =  dim(D^)  =  r,  xV^r)  — 
{1, 2, . . . ,  dT},  and  =  {n  —  dr  +  l, n  —  dr  +  2, . . .  ,n}. 

The  state  complexity  profile  of  a  linear  block  code  C  is 
s(C),  where  Si(C)  =  k  —  pi  —  fi  and  p*,  fi  are  the  dimen¬ 
sions  of  the  past  and  future  subcodes  [2,  3,  5]. 

The  concept  of  a  binary  Z4  -  linear  code  was  introduced 
by  A.  R.  Hammons,  P.  V.  Kumar,  A.R.  Calderbank,  N. 
J.A.  Sloane,  P.  Sole  [4].  A  binary  code  is  Z4  -  linear  if 
its  coordinates  can  be  arranged  so  that  it  is  the  image 
under  the  Gray  map  0  of  a  linear  block  code  over  Z4,  i.e. 
an  additive  subgroup  of  Z%. 

II.  Main  Results 


Lemma  1  A  sufficient  condition  for  an  [n,  5]  code  C 
with  d\  +  d2  <  n  to  satisfy  the  two-way  chain  condition 
is  that  C  is  generated  by  a  matrix 


( 


V 


1 

o 

o 

0 

0 


a2 

1 

1 

0 

0 

0 


0-3 

0 

1 

0 

0 

0 


0 

1 

1 

0 

0 


05 

0 

0 

1 

0 

0 


0 

0 

1 

1 

0 


Ol 

.✓s. 

0 

0 

0 

0 

1 


where 

aj  <  a\ ,  2  <  j  <  5, 

ai  <  a3  +  aj+i  ,2  <  j  <4. 
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Abstract  —  We  develop  a  theory  of  minimal  trellises 
for  convolutional  codes,  and  find  that  the  “standard” 
trellis  need  not  be  the  minimal  trellis. 

I.  Introduction 

From  a  minimal  generator  matrix  G(D)  for  an  (n,  fc,  m) 
convolutional  code,  it  is  possible  to  construct  a  “standard” 
trellis  representation  for  C .  This  trellis  is  in  principle  infinite, 
but  it  has  a  very  regular  structure,  consisting  (after  a  short 
initial  transient)  of  repeated  copies  of  what  we  shall  call  the 
trellis  module  associated  with  G(D).  The  trellis  module  con¬ 
sists  of  2m  “initial  states”  and  2m  “final  states,”  with  each 
initial  state  being  connected  by  a  directed  edge  to  exactly  2fc 
final  states.  Each  directed  edge  is  labelled  with  an  n-bit  bi¬ 
nary  vector,  namely,  the  output  produced  by  the  encoder  in 
response  to  the  given  state  transition. 

Since  the  trellis  module  has  2fc+7n  edges,  and  each  edge  has 
“length”  (measured  in  bits)  ra,  then  total  edge  length  of  the 
trellis  module  is  n-  2fc+m.  Since  each  trellis  module  represents 
the  encoder’s  response  to  k  input  bits,  we  are  led  to  define  the 
“standard  trellis  complexity”  of  the  code  as 

^  •  27n+k  edges  per  bit.  (1) 


Since  each  module  represents  four  encoded  bits,  the  trellis 
complexity,  as  measured  in  trellis  edges  per  encoded  bit,  is 
thereby  reduced  to  120. 

In  this  example,  the  trellis  complexity  can  be  reduced  still 
further,  if  we  allow  column  permutations  of  the  original  gen¬ 
erator  matrix  G(D)  in  2.  Indeed,  by  computer  search  we  have 
found  that  one  “minimal  complexity”  column  permutation  for 
this  particular  code  is  the  permutation  (01243567),  which  re¬ 
sults  in  the  generator  matrix  (cf.  (2)) 


Gp)  = 


/ 11111111  \ 
11110000  ] 
10101100 
V 10011010/ 


/  00000000  \ 
11011000  j 
10110100 
\  10001110/ 


(4) 


Then  after  putting  the  minimal  generator  matrix  of  (4)  into 
“minimal  span”  form,  it  becomes 


G(D)  = 


/ 11111111  \ 
00001111 
01111111 
V  00111111/ 


/  00000000  \ 
11111000 
11111100 
V 11111110  / 


(5) 


The  trellis  complexity  of  the  generator  matrix  in  (5)  turns  out 
to  be  104  edges  per  encoded  bit. 


The  standard  trellis  complexity  as  defined  in  (1)  is  a  mea¬ 
sure  of  the  effort  per  decoded  bit  required  by  Viterbi’s  algo¬ 
rithm.  However,  we  will  see  in  the  next  section  that  this  com¬ 
plexity  can  sometimes  be  reduced,  by  the  construction  of  a 
simplified  trellis  for  the  code. 


II.  Example 

Consider  the  (8,4,3),  dfree  =  8,  “partial  unit  memory”  con¬ 
volutional  code  with  minimal  generator  matrix 


G(D)  = 


/ 11111111  \ 
11101000 
10110100 
V 10011010/ 


/  00000000  \ 
11011000  j 
10101100  I 
V 10010110/ 


D 


(2) 


(see  [3]).  According  to  (1),  the  “standard”  trellis  complexity 
of  this  code  is  256  edges  per  bit.  However,  it  is  quite  easy  to 
reduce  this  number,  as  follows. 

We  view  the  code  in  (2)  as  an  (infinite-length)  block  code, 
with  “scalar”  generator  matrix 


III.  General  Results 

We  have  found  a  simple  algorithm  algorithm  for  finding 
a  generator  matrix  G(D)  for  a  convolutional  code,  for  which 
the  corresponding  “scalar”  generator  matrix  (cf.  (3))  is  in 
“minimal  span”  form  [4].  This  generator  matrix  can  then  be 
used  to  produce  the  minimal  trellis  for  the  convolutional  code. 
In  principle,  the  theory  of  minimal  trellises  for  convolutional 
codes  can  be  deduced  from  the  general  “Forney-Trott”  theory 
[2],  but  we  believe  the  observation  that  the  Viterbi  decoding 
complexity  of  convolutional  codes  can  be  thereby  systemati¬ 
cally  reduced  is  new,  as  are  the  details  of  the  algorithms  for 
producing  the  minimal  trellises. 

One  nice  by-product  of  our  theory  is  that  when  we  apply 
our  techniques  to  a  convolutional  code  obtained  by  puncturing 
[1],  we  always  find  a  trellis  for  that  code  which  is  as  least  as 
simple  as  the  “punctured”  trellis.  Thus  in  the  new  theory, 
punctured  convolutional  codes  no  longer  appear  as  a  special 
class,  but  simply  as  high- rate  convolutional  codes  whose  trellis 
complexity  turns  out  to  be  unexpectedly  small. 


rG0  G  i 
Go 


Gscalar  — 


Gx 

Go  Gi 
Go  Gi 


(3) 


where  G(D)  =  Go  +  D  •  G\(D).  From  this  representation,  and 
using  a  modification  of  the  now  “standard”  theory  of  trellises 
for  block  codes  [4],  one  can  see  that  the  code  has  a  minimal 
trellis,  built  from  trellis  modules,  each  of  which  has  480  edges. 

1This  work  was  partially  supported  by  a  grant  from  Pacific  Bell. 
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Abstract  —  Binary  block  codes  exceeding  the 
Gilbert— Varshamov  bound  on  minimum  Hamming 
distance  (if  such  codes  exist)  have  their  error  expo¬ 
nents  below  the  one  of  the  Binary  Symmetric  Channel 
(BSC)  in  the  interval  ( RCrit,  Rc)  and  they  cannot  reach 
the  cutoff  rate  R0  and  thus  the  capacity  Rc  of  the  BSC 
if  a  Maximum  Likelihood  decoder  (MLD)  is  used. 

I.  Introduction 

In  [1]  a  nonstandard  technique  for  bounding  the  error  expo¬ 
nent  of  specific  families  of  channel  block  codes  was  introduced. 
Contrary  to  the  standard  methods  based  on  ensemble  averag- 
ing,  this  technique,  called  the  distance  distribution  method, 
enables  the  unification  of  three  different  approaches  to  the 
asymptotical  analysis  of  channel  codes:  channel  coding  theo¬ 
rems,  bounds  on  the  error  exponent,  and  bounds  on  the  min¬ 
imum  distance. 

II.  The  Connection  between  dHGV(R)  and  Rq 

The  general  lower  bound  on  the  code  family  error  exponent 
was  obtained  in  the  following  form 

£(*)»  =  „(£&«,.  (E&(d,R)  +  Ee(d,R,&)-R)  ,  (1) 

d<5i(&,R) 

where  &  is  an  infinite  family  of  chaimel  block  codes  B(R ,  N) 
over  a  finite  or  infinite  alphabet,  provided  by  some  channel- 
determined  distance  measure  d  between  its  codewords.  §&  is 
characterized  by  the  distance  distribution  exponents  (DDE) 

E&id,  R)  =  {limw_oo  -±  Id  (M(^-i))  ,  (2) 

where  m  represents  the  number  of  ordered  pairs  of  codewords 
from  B(R,  N)  that  are  on  some  fixed  distance  di  >  0,  L  the 
total  number  of  different  values  of  d  >  0  in  B(R,  N)  (arranged 
in  increasing  order),  and  M  =  2RN  the  number  of  codewords 
in  B(R ,  N).  The  influence  of  the  decoding  algorithm  and  the 
channel  performance  is  characterized  by  the  error  effect  expo¬ 
nent  (EEE) 

Ee(d,  R)  =  {lim^-^oo  ld(ej)  ,  (3) 

where  ei  —  P  [x  =  x3  \  xm,  m  7^  j,di  =  d(x3,  a?m)]  represents 
the  error  effect  of  the  codeword  x3  when  the  codeword  xm  is 
erroneously  decoded  provided  xm  and  x3  are  on  some  fixed 
distance  di.  Ee(d,R,&)  in  (1)  denotes  some  lower  bound 
on  (3).  For  each  fixed  value  R  >  0  of  the  code  rate,  cho¬ 
sen  from  the  set  of  possible  family  rates  the  code  fam¬ 
ily  &  contains  an  infinite  fixed  rate  sequence  of  block  codes 
FRS(R,&)  =  (B(R,Nl)1B(R1N2),...)  where  N,  <  At+1 
and  R  =  ld  Mt/Nt  =  const,  i  =  1, 2, . . .  with  Mi  =  \B(R,  Ni)\. 

R)  and  R )  in  (1)  are  asymptotical  values  of  min. 

and  max.  distances  of  codes  in  FRS(R ,  §&)  for  each  R  e  71. 


For  the  family  of  uniformly  distributed  binary  codes, 
whose  Hamming  distances  are  binomially  distributed 

m ( m — 1 )  ~  2  ^(1)5  ^  =  dn  =  1,  2, . . . ,  N  (4) 

for  all  rates  R  £  R  =  [0,1],  the  DDE  function  (2)  in 
the  interval  [0,  0.5]  represents  the  Gilbert-Varshamov  curve 
d.HGv(R)  when  d  represents  the  normalized  Hamming  dis¬ 
tance  dH  =  du/N ,  i.e.,  E&ub(dH,  R)  =  1  —  H(dH),  where 
H(x)  is  the  binary  entropy  function  [2].  On  the  other  hand, 
when  d  represents  the  normalized  Bhattacharyya  distance 
on  the  BSC  (with  transition  probability  p)  given  by  dB  = 
-Id \/4p(  1  -  p)dH  the  DDE  function  (2),  E&ub  (dB,  R),  of  the 
family  &ub  determines  the  cutoff  rate  R0  of  the  BSC.  Under 
the  usual  condition  of  equal  prior  probabilities  of  codewords 
and  using  the  MLD,  this  fact  was  shown  in  [l]  by  replacing 
E&uh{dB)  R)  in  (1)  and  using  a  very  simple  lower  bound  on 
the  EEE  function  (3)  given  by  Ee(dB,  R,g&)  =  Ee(ds)  -  dB. 

n  .  _  III.  Sketch  of  the  Proof. 

Proving  that  the  Hamming  distance  distribution  expo¬ 
nent  E frs*  {dHl  R')  of  a  hypothetical  fixed  rate  sequence 
FRS*  (R')  of  binary  block  codes  with  asymptotical  nor¬ 
malized  minimum  Hamming  distance  that  exceed 

the  Gilbert-Varshamov  bound  must  intersect  the  Gilbert- 
Varshamov  curve  is  the  first  step  in  the  proof  of  the  main 
statement  of  this  paper.  This  can  be  proven  by  chosing 
the  special  value  p  =  p'crit  for  which  R'  =  Rcrit .  Then 
E frs*  ( dH ,  R ')  >  E&ub(dHi  R')  for  0  <  dH  <  0.5  contradicts 
the  space-partitioning  upper  bound  on  E(R)bsc .  Further¬ 
more,  using  the  distance  distribution  method  it  can  be  shown 
that  the  cutoff  rate  lower  bound  £0(7?')frs*  on  the  error  ex¬ 
ponent  of  FRS*(R')  on  the  BSC  must  be  smaller  than  the 
cutoff  rate  lower  bound  E0(R)  on  the  error  exponent  of  the 
BSC  for  R=  R'  and  for  p  >  p'crit ,  i.  e.,  when  Rcrtt  <  R'  <  Rc . 

i  .  IV.  Conclusion 

It  is  shown  that  the  still  open  famous  problem  of  finding  bi¬ 
nary  codes  with  minimum  distances  that  exceed  the  Gilbert- 
Varshamov  bound  is  irrelevant  when  MLD  is  used.  In  this 
case  the  Gilbert-Varshamov  bound  (curve)  uniquely  deter¬ 
mines  the  error  exponent  of  asymptotically  optimal  binary 
block  codes  on  the  BSC  in  the  interval  (Rcrit,  Rc ).  The  same 
conclusion  is  valid  for  spherical  codes  on  the  AWGN  channel 
that  exceed  the  Shannon  lower  bound  on  minimal  Euclidean 
distance. 
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Abstract  —  It  is  well  known  that  time-varying  con¬ 
volutional  codes  can  achieve  the  capacity  of  a  discrete 
memoryless  channel  [l].  The  time  varying  assumption 
is  needed  in  the  proof  to  assure  pairwise  independency 
between  the  codewords.  In  this  work  we  provide  a 
relatively  simple  proof  that  indeed  time- invariant  con¬ 
volutional  codes  can  achieve  the  capacity  without  any 
restriction  (albeit,  the  error  exponent  achieved  by  our 
proof  may  not  be  the  optimal). 

I.  Overview 

We  consider  the  following  setting  of  fixed  (time-invariant)  con¬ 
volutional  codes  with  rate  R—b/n  bits  per  symbol:  At  each 
time  instance  an  information  vector  ut  =  {u\ ,  u2t , . .  . ,  ut }  of  6 
bits  is  pushed  into  a  delay  line  (register)  of  length  K  (i.e. 
the  delay  line  contains  b  •  K  bits).  Then  n  •  q  bits,  aij, 
i  £  {1, . . . ,  n},  j  6  {1, . . . ,  </},  which  are  linear  combinations  of 
bits  in  the  register  are  calculated.  These  combinations  define 
the  specific  convolutional  code,  n  output  symbols,  {oi}P=1, 
are  produced  using  a  mapping  from  bits  to  channel  symbols, 
M  :  {0,1}*  ->  {1,...,  J},  Oi  =  The  map¬ 

ping  defines  a  distribution  Q(k)  =  2~*|{a;  Ad(a)  =  A;} | .  Note 
that  as  q  — ►  oo  any  distribution  Q  can  be  approximated. 

We  show  that  for  a  given  DMC  with  a  transition  proba¬ 
bility  P(y\x),  a  distribution  Q(x)  and  any  b  and  n  such  that 
b/n  <  7(Q;P)  , where  /(•;•)  is  the  mutual  information,  there 
exists  a  sequence  of  convolutional  codes  of  increasing  K  such 
that  for  K  — ►  oo,  Pe rror  — ►  0  exponentially  where  Perror  is  the 
probability  of  an  error  in  decoding  N  •  b  transmitted  bits. 

II.  Outline  of  the  proof 

We  analyze  the  average  performance  of  an  ensemble  of  con¬ 
volutional  codes,  defined  by  a  randomly  chosen  q  •  n  linear 
combinations  (requiring  q  *  n  ■  b  •  K  random  bits),  and  by  a 
random  initial  value  of  the  register. 

Our  proof  analyzes  a  sub-optimal  decoding  procedure  in 
which  at  each  time  point  t  we  decode  the  information  symbol 
Ut  based  on  a  future  observed  block  of  size  Lt  •  n  symbols.  The 
value  Lt  will  be  chosen,  as  described  below,  so  that  Gallager’s 
technique  to  upper  bound  the  error  probability  in  block  cod¬ 
ing  (see  [3,  pp  135-150])  can  be  applied,  i.e.,  that  there  will 
be  a  pairwise  independence  between  the  true  codeword  and 
any  codeword  that  can  cause  an  error  in  decoding  ut.  If  an 
error  will  occur  at  any  time  point,  we  shall  declare  that  our 
decoding  has  failed.  We  shall  show  that,  on  the  average,  the 
error  probability  in  decoding  ut  will  vanish,  exponentially  in 
K.  Thus,  as  long  as  the  information  sequence  length  N  is 
short  enough,  Pe rror  will  also  vanish  exponentially. 

Specifically,  we  first  constrain  Lt  <  A/2.  Then,  we  use  the 
fact  that  if  A  is  a  binary  matrix  with  rank  l  and  v  is  a  random 
binary  vector  with  uniform  iid  components  ,  then  the  random 
vector  Av  has  l  uniform  iid  components.  A  lower  bound  on 
the  number  of  symbols  we  can  use  and  still  have  pairwise  in¬ 
dependence,  Lt  ■  n,  can  be  calculated  from  the  current  register 


value.  We  assume  to  know  the  current  register  value,  since 
otherwise  an  error  has  already  occurred,  and  there  is  no  need 
to  calculate  the  error  probability  in  decoding  ut.  It  can  be 
shown  that  taking  Lt  =  l,  where  l  is  the  maximum  number 
such  that  the  rows  of  the  matrix 

/  U t_K  ut--£-l  •  U«-A'+1  \ 

Ut_£  +  1  Ut-4  Ut-JC+2 

\  1  Ut-^  +  !-2  *"  Ut-K+l  / 

are  still  linear  independent,  will  ensure  the  desired  pairwise 
independence.  (This  matrix  is  known  to  the  decoder  be¬ 
cause  it  contains  only  bits  that  have  already  been  decod¬ 
ed).  Now,  we  analyze  the  error  probability,  averaged  over 
a  uniform  choice  over  the  messages,  i.e.,  under  the  assump¬ 
tion  that  u  are  uniformly  distributed.  In  this  case  we  have 
Pr{ Lt  =  /}  <  26^-^.  For  each  value  of  Lt  we  face  the  situa¬ 
tion  where  we  observe  a  block  of  Lt  •  n  symbols  and  we  try  to 
decide  between  at  most  2b  Lt  randomly  chosen  different  pos¬ 
sible  inputs.  The  error  probability  in  this  case  can  be  upper 
bounded  by  Gallager’s  exponential  expression  for  block  codes. 
Using  this  expression,  and  taking  the  expectation  with  respect 
to  we  get:  K/2 

2b(l-X-)  '  2blp  m  2-nlE0(p,Q ) 

1  =  0 

<  (14-  K/2)  •  2-n%mi^R>E^'Q)-pR) 

Perror  5:  ^  Ae 

For  R  <  I(Q ;  P)  and  logA  =  o(K  •  n),  the  expression  above 
goes  to  zero  exponentially  with  K  •  n. 

III.  Discussion  and  further  improvements 

The  achieved  error  exponent  above  is  worse  than  the  error  ex¬ 
ponent  for  time  varying  convolutional  codes  [1],  and  even  from 
the  error  exponent  for  block  codes  [3].  A  better  exponent  was 
achieved  for  special  cases  such  as  BSC  by  a  slight  change  in  the 
proof.  In  [2],  it  was  claimed  (without  a  proof)  that  for  b  — ►  oo, 
time-invariant  convolutional  codes  can  achieve  the  same  expo¬ 
nent  as  time  varying  codes.  This  claim  was  also  proved  by  us 
with  similar  technique  (but  without  constraining  Lt  to  be  less 
than  K/2).  The  question  whether  fixed  convolutional  codes 
has  the  same  error-exponent  as  time- varying  for  any  6,  is  still 
under  investigation. 
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Abstract  —  This  paper  proposes  an  explicit  con¬ 
struction  of  codes  achieving  Shannon’s  capacity  for  ar¬ 
bitrary  discrete  memoryless  channels.  The  proposed 
codes  are  obtained  by  applying  the  idea  of  variable 
concatenation  to  a  class  of  concatenated  codes  with 
employing  algebraic  geometry  codes  as  outer  codes. 
Further,  we  clarify  that  the  error  exponent  of  the  pro¬ 
posed  code  is  equal  to  the  error  exponent  obtained  by 
Forney  for  concatenated  codes. 

I.  Introduction 

In  1982,  P.  Delsarte  and  P.  Piret  gave  an  explicit  construction 
of  codes  achieving  the  capacity  and  admitting  a  simple  decod¬ 
ing  algorithm[l].  Recently,  M.  Steiner  expanded  their  results 
and  gave  an  explicit  construction  of  codes  having  the  decoding 
error  probability  bounded  by  an  exponential  function  of  block 
length  at  all  rate  below  capacity  for  any  discrete  memoryless 
channel[2].  However,  the  error  exponent  of  the  code  is  far 
below  Forney’s  error  exponent  which  gives  the  performance 
obtainable  with  concatenated  codes [3]. 

This  paper  proposes  a  new  explicit  algebraic  construction 
of  codes  achieving  Shannon’s  capacity  for  any  discrete  memo¬ 
ryless  channel.  The  proposed  code  can  be  regarded  as  a  gener¬ 
alization  of  Justesen  codes[4]  followed  by  a  channel- depen  dent 
mapping,  with  employing  an  algebraic  geometry  code[5]  as  the 
outer  code.  The  proposed  codes  are  optimum  in  the  sense  that 
they  can  attain  Forney’s  error  exponent. 


In  order  to  apply  the  proposed  code  C  to  a  channel,  the 
symbols  of  inner  codewords  are  mapped  into  channel  input 
symbols  by  a  channel-dependent  mapping  77  :  GF(q)  —>  X. 
The  mapping  r)  is  constructed  such  that  the  occurrence  prob¬ 
ability  of  x  6  X  approximates  the  desired  input  probability 
Qmax  )  which  achieves  capacity  of  the  channel[6].  More  pre¬ 
cisely,  for  all  x  €  X ,  let  ix  be  integers,  such  that  ix/q  & 
Qmax  (*)  and  ».  =  q.  Then,  ix  distinct  symbols  of 

GF(q)  are  mapped  into  the  identical  channel  input  symbol  x. 
Hence,  for  any  8  >  0  and  appropriately  chosen  ix  (x  6  X),  we 
can  find  a  sufficiently  large  q  (or  L)  such  that 


Qmai(^) 


ix/q 


<0  Vx  €  X. 


(2) 


Let  us  denote  this  channel- dependent  code  by  C. 


IV.  Exponential  Error  Bounds 

The  next  theorem  is  our  main  result. 

Theorem  1:  Let  the  inner  codes  be  decoded  by  maximum 
likelihood  decoding  and  the  outer  code  by  GMD  decoding. 
Then,  on  arbitrary  DMC,  for  arbitrarily  fixed  e  >  0  and  suffi¬ 
ciently  large  m  and  L,  the  proposed  code  C  of  overall  length 
N0  and  overall  rate  i?0(=  K0/N0 )  has  the  average  probability 
of  decoding  error  Pe  bounded  by 

Pe  <  q-NoEP(R0,e,6)'  (3) 

where 


II.  The  Ensemble  of  Inner  Encoders 

Let  us  consider  a  discrete  memoryless  channel  (DMC)  with 
input  alphabet  X  and  output  alphabet  Y .  We  assume  that  a 
set  of  messages  delivered  by  the  information  source  consists  of 
all  fc-tuples  a  =  (ai ,  ,  •  *  • ,  a* )  with  ax  6  GF(q) ,  for  a  certain 

positive  integer  k. 

Let  us  identify  GF(q)k  with  any  ^-dimensional  subspace  of 
GF(qn).  To  each  pair  (7,a)  of  elements  7  and  a  of  GF(qn)x 
we  associate  the  affine  encoder  g  :  GF(q)k  ->  GF(qn)  given 

by 

g(a)=7a  +  a,  a  €  GF(q)k ,  (1) 

and  define  Ga  to  be  the  set  of  all  such  encoders.  Further,  let 
G  be  a  set  of  encoders  expurgating  the  encoder  with  7  =  a  =  0 
from  Ga>  Obviously,  G  C  GA,  \GA\  =  q2n  and  \G\  =  q2rL  -  1, 
As  is  usual,  the  number  r  —  k/n  is  referred  to  as  the  rate. 

III.  Code  Construction 

The  outer  code  is  formed  by  an  algebraic  geometry  code 
Ch(N,K)  constructed  from  a  generalized  Hermitian  curve [5], 
which  is  a  linear  code  over  GF(q2m)  with  q  =  2L  and  the 
code  length  N  —  q2n  —  1.  The  inner  codes  are  variable  (ny  k) 
codes  over  GF{q)  which  belong  to  G,  where  k  =  2m.  The 
overall  concatenated  code  is  an  (N0,K0)  code  over  GF(q), 
where  N0  =  nN  and  K0  —  2 mK.  Let  us  denote  the  proposed 
(channel-independent)  code  by  C. 


EP(R0,e,8)  =  o<maxJl  -  R)  (e  (^ )  _  e  -  «(*))  ,  (4) 

E(r)  is  the  Gallager’s  reliability  function[6],  and  a(8)  —4  0  as 
0  —  0.  D 
By  choosing  e  and  0  properly,  we  can  obtain  E(R0 ,  e,  8)  >  0 
whenever  R0  <  G0,  where  C0  denotes  the  capacity  of  the 
original  channel.  This  implies  that  the  proposed  codes  achieve 
Shannon’s  capacity.  Further,  the  error  exponent  EP(R0, 0,0) 
is  equal  to  Forney’s  error  exponent [3]. 
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Abstract  —  Shannon’s  restricted  two-way  channel  is 
studied.  The  outer  and  the  inner  bounds  are  obtained 
of  the  region  of  rates  achievable  when  error  probabili¬ 
ties  exponentially  decrease  with  given  at  two  outputs 
exponents. 

A  restricted  two-way  channel  (RTWC)  is  defined  by  a  ma¬ 
trix  of  transition  probabilities 

W  —  {W(yi,y2\xi,x2),xi  G  X\,x2  G  X2,y\  €  Vi>2/2  6  3^}> 

where  X\,  X2  are  the  finite  input  alphabets  and  3h,  V2  are 
the  finite  output  alphabets  of  the  channel.  The  channel  is 
supposed  to  be  memoryless.  RTWC  is  represented  in  figure. 


There  are  two  terminals.  When  the  symbol  x\  G  X\  is 
sent  from  the  first  terminal,  the  corresponding  output  symbol 
y\  £  3h  arrives  on  the  terminal  2.  At  the  same  time  the 
input  symbol  x2  is  transmitted  from  the  terminal  2  and  the 
corresponding  symbol  y2  arrives  on  the  opposite  terminal.  Let 

Mi  =  {1,  2, . . . ,  |Adi|}  and  M2  =  {l,21...,|Ma|} 

be  the  message  sets  of  corresponding  sources.  The  code  for 
RTWC  is  a  collection  of  mappings  (/ 1,  f2 ,  <71,  g 2),  where  f\  : 
M\  — ►  Xi  ,  f2  :  M2  — >  X2  are  the  encodings  and  g\  : 
M2  x  Tf  — +  Adi,  g2  :  Adi  x  — ►  M2  are  the  decodings. 

The  restrictions  mentioned  in  the  name  of  the  model,  means 
that  in  the  RTWC  there  are  no  connections  from  decoders  to 
encoders  on  the  same  terminal.  The  average  error  probabili¬ 
ties  of  the  code  e;(/i ,  f2,  51,  g2)  is  considered. 

Let  e  =  (ci,  C2),  0  <  €{  <  1,  i  —  1,2.  Nonnegative  numbers 
Ri,  R2  are  called  e-achievable  rates  pair  for  RTWC,  if  for  any 
Si  >  0,  i  =  1,  2  there  exists  a  code  such  that  for  sufficiently 
large  n 

-logl-Mil  >  Ri-Su  t  =  1,2, 

n 

and 

51,32)  <  u,  i  =  1,2. 

The  set  of  all  e-achievable  rates  pairs  is  called  the  capacity 
region.  For  e*  =  exp(— nE;),  Ei  >  0,  i  =  1,  2,  E  =  (Ei,E2) 
the  region  of  achievable  rates  we  call  E-capacity  region  C(E). 

The  RTWC  was  first  investigated  by  Shannon  [1],  who  ob¬ 
tained  the  capacity  region  of  the  RTWC.  Important  results 


relating  to  various  models  of  two-way  channels  were  obtained 
by  Ahlswede  [2-4],  Zhang,  Berger  and  Schalkwijk  [5],  Han  [6]. 
Papers  of  Van  der  Meulen  [7],  Gelfand  and  Prelov  [8]  and  the 
book  of  Csiszar  and  Korner  [9]  contain  the  detailed  surveys. 

In  the  present  paper  the  outer  sphere  packing  and  the  inner 
random  coding  bounds  for  C{E)  are  constructed.  For  small  E 
this  bounds  coincide  and  when  Ei  — »  0,  i  =  1,  2  we  obtain  the 
capacity  region  of  RTWC. 

The  inner  bound  is  obtained  using  the  Shannon’s  random 
coding  medhod,  and  upper  bound  is  constructed  by  the  com¬ 
binatorial  method  proposed  by  Haroutunian  [10]. 
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Abstract  —  It  is  shown  that  lattice  codes  (intersec¬ 
tion  of  a  sphere  with  a  possibly  translated  lattice)  can 
achieve  capacity  on  the  additive  white  Gaussian  noise 
channel. 


I.  Introduction 

Consider  the  additive  white  Gaussian  noise  (AWGN)  channel 
with  peak  signal-power  constraint  S.  It  is  well  known  [l]  that 
the  capacity  of  this  channel  is  C  —  |  log(l-j--j|),  where  N  is  the 
variance  of  the  i.i.d.  Gaussian  noise.  The  proof  in  [1]  is  non 
constructive  in  nature  and,  hence,  codes  that  achieve  capacity 
may  exhibit  little  or  no  structure,  making  them  ill  suited  for 
practical  applications.  An  important  class  of  structured  codes 
are  lattice  codes  which  we  define  to  be  the  intersection  of  a 
possibly  translated  lattice  A  with  a  spherical  bounding  region 
centered  at  the  origin.  The  following  facts  are  known:  (1) 
For  any  rate  R  <  |log(-j|)  there  exists  a  lattice  code  which 
results  in  an  arbitrarily  small  (maximum)  probability  of  error 
when  used  with  lattice  decoding  [4,  5,  6].  (2)  If  we  choose 
the  code  as  the  intersection  of  a  possibly  translated  lattice 
with  a  “thin”  spherical  shell  centered  at  the  origin  then  rates 
up  to  capacity  can  be  achieved  with  arbitrarily  low  (average) 
probability  of  error  under  a  minimum  distance  decoding  [2, 
3].  Further,  the  rate  at  which  the  error  probability  tends  to 
zero  is  essentially  equal  to  the  optimum  one  as  determined  by 
Shannon  [1].  Regarding  the  second  result,  in  [3]  it  was  pointed 
out  that  because  of  the  “thin”  spherical  bounding  region  these 
codes  resemble  more  random  codes  than  lattice  codes. 

We  use  [2,  3]  to  close  one  of  the  remaining  gaps  by  showing 
that  lattice  codes  (where  the  boundary  region  is  a  sphere  as 
opposed  to  a  spherical  shell)  combined  with  minimum  distance 
decoding  can  achieve  capacity.  This  is 

Theorem  1  Let  S,  N  and  e  >  0  be  given.  If  R  <  |log(l  +  -j|) 
then  there  exists  a  lattice  code  for  the  additive  white  Gaussian 
noise  channel  with  peak  power  constraint  S  and  noise  variance 
N  with  rate  lower  bounded  by  R  and  average  probability  of 
error  of  a  minimum  distance  decoder  upper  bounded  by  e. 

II.  Proof  Outline 

The  result  is  not  surprising  since  in  high  dimensions  most  of 
the  volume  within  a  sphere  lies  in  a  thin  spherical  shell  and, 
hence,  by  adding  the  volume  of  the  inner  sphere  to  the  bound¬ 
ing  region  we  expect  that  not  too  many  new  lattice  points  are 
added.  The  two  new  key  ingredients  which  make  it  possible 
to  extend  the  proof  in  [2,  3]  to  Theorem  1  can  be  stated  as 
follows. 

Let  S  be  the  available  signal  power  per  dimension  and  N 
the  noise  variance.  Let  R  be  given  such  that  R  <  ~  log ( 1  -f-  ^). 

This  work  was  supported  by  National  Science  Foundation  Grant 
NCR-9357689  and  NCR-9304763. 


Then  there  exist  numbers  R'  and  S'  such  that  R  <  R’  = 
|  log  ^1  +  |  log  (l  +  Let  Tn  be  the  n-dimensional 

closed  sphere  of  radius  y/nS  and  volume  Vn,  and  let  T„  be 
the  rc-dimensional  open  sphere  of  radius  y/nS'  and  volume 
V^.  Further,  define  TA  =  Tn  —  T„  with  volume  VA  —  Vn  — 
Given  a  lattice  An  with  fundamental  region  Pn  and  s  £ 
Pn,  define  the  lattice  code  Cn  =  Cn( A„,  s)  =  (An  +  s)  D  Tn. 
Similarly,  define  the  subcodes  C'n  =  C'n(An,  s )  =  (An  +  s)nT„, 
and  CA  =  Cn  (An,  s)  =  (An  +s)nl^  =  Cn(A„,  s)  \  e;(A„,  s). 
Let  M„  =  M„( An,  s),  M’n  =  M'n(h.n,  s)  and  MA  =  MA{ An,  s) 
be  the  cardinalities  of  these  codes. 

The  first  lemma  states  that  adding  the  lattice  points  within 
the  inner  sphere  does  not  increase  the  error  probability  by 
more  than  the  fraction  of  these  points  to  the  total  number 
of  codewords.  More  precisely,  if  P%  denotes  the  average  error 
probability  of  a  code  C  under  minimum  distance  decoding  then 
we  have 

Lemma  1  Pe"  <  %£  +  Pe"  ■ 

The  second  lemma  shows  that  the  translation  vector  of  the 
lattice  can  indeed  be  chosen  in  such  a  way  that  there  are 
sufficiently  many  lattice  points  within  the  spherical  shell  but 
not  too  many  within  the  inner  sphere. 


Lemma  2  Let  An  be  a  lattice  with  fundamental  region  Pn 
and  determinant  det(An)  and  define 


Pn  =  {s  e  Pn:  Mn{An,  s)  > 


VnA  K(An,*)  Vn 

4det(An)’ M£(An,s)  ~  VnA 

Then  VA  <  2  fp.  MA(An,  s)  dV(s). 

These  two  lemmas  are  then  used  together  with  the  methods 
presented  in  [2,  3]  to  prove  Theorem  1. 
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Abstract  —  We  show  that  single-user  codes  and  a 
modified  successive  decoding  scheme  can  be  used  to 
achieve  symmetric  capacity  of  a  L-out-of-K  additive 
white  Gaussian  channel  in  all  signal-to-noise  ratios. 

A  L-out-of-K  (LOOK)  additive  white  Gaussian  noise 
(AWGN)  channel  models  a  multiuser  system  with  K  potential 
users,  but  with  at  most  L  simultaneously  active  users.  The 
received  signal,  when  the  set  of  active  users  is  S  (|$|  <  L),  is 
given  by 

Y(s)  =  Y,Xk  +  N' 

kes 

where  N  is  white  Gaussian  noise.  We  assume  that  neither  the 
transmitters  nor  the  receiver  know  the  set  of  active  users. 

The  capacity  region  of  this  channel  is  known  [1]  and  the 
symmetric  capacity,  defined  as  the  maximum  value  of  R  where 
(R, . . . ,  R)  is  in  the  capacity  region,  is  given  by 

C,ym(L,  w)  =  log[l  +  Lw], 

when  all  users  have  the  same  symbol  signal-to-noise  ratio 
(SNR),  w .  The  result  remains  the  same  even  if  the  users  are 
frame- asynchronous. 

It  is  important  to  note  that  this  is  the  same  as  the  sym¬ 
metric  capacity  of  a  L-user  AWGN  channel.  Hence,  the  fact 
that  the  transmitters  do  not  know  the  set  of  active  users  does 
not  cause  any  degradation  in  the  symmetric  capacity. 

In  low  SNR,  we  show  that  single-user  codes  can  be  used 
to  achieve  rate  very  close  to  the  symmetric  capacity.  In  low 
SNR  (i.e.  w/(  1  +  (L  -  l)w)  «  1  or  Csym  «  1),  binary  sig¬ 
naling  is  close  to  optimal.  If  each  user  uses  a  low  rate  convo¬ 
lutional  code  and  a  binary  scrambler  before  transmission,  the 
codeword  probability  of  error  associated  with  the  single-user 
soft-decision  Viterbi  decoder  is  well  approximated  by  assum¬ 
ing  all  other  users’  signals  as  Gaussian  noise.  This  is  mainly 
because  the  maximum  likelihood  codeword  decision  is  based 
on  the  sum  of  many  received  symbols  (at  least  D  where  D  is 
the  free  distance  of  the  convolutional  code).  When  the  code 
rate  is  low,  D  is  large  (for  sufficiently  large  constraint  length) 
and  the  Central  Limit  Theorem  applies  as  the  scramblers  at 
the  transmitters  ensure  i.i.d.  transmit  symbols.  Hence,  the 
capacity  of  each  user,  regardless  of  which  set  of  L  users  are 
active,  is  closely  approximated  by 


C$uc  —  “  log 


1  + 


1  -f  (L  —  l)w 


Defining  the  symmetric  capacity  ratio 


7/suc(T,  W>) 


Gs«c(L,  w ) 
Csym(L,  U?) 


1This  work  was  performed  at  University  of  Colorado  at  Boulder 
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we  find  that  lim^^o  r)suc(L,  w)  =  1  and  lim^—oo  Vsuc(L,w)  = 
0.  Hence,  treating  other  users’  signals  as  noise  is  near  optimal 
in  low  SNR  since  background  noise  is  the  dominating  factor. 

In  high  SNR,  we  propose  a  modified  successive  decoding 
scheme  that  uses  only  single-user  coding  and  decoding  tech¬ 
niques  to  achieve  the  symmetric  capacity.  In  the  LOOK  chan¬ 
nel,  since  none  of  the  transmitter  knows  who  are  the  active 
users,  the  successive  decoding  (or  onion  peeling)  scheme  used 
in  [2,  3,  4]  cannot  be  applied  directly.  In  this  modified  ap¬ 
proach,  we  split  each  user  into  N  sub-users  and  apply  the  suc¬ 
cessive  decoding  scheme  on  the  sub-users  of  the  users,  instead 
of  on  the  K  users  themselves.  The  receiver  consists  of  a  TV- 
level  successive  decoding  scheme.  In  the  nth  level,  the  receiver 
decodes  the  nth  sub-users  of  all  users,  treating  the  remaining 
interference  from  all  sub-users  of  all  users  as  noise.  Then,  it 
subtracts  the  re-encoded  signals  of  the  nth  sub-users  from  the 
received  signal  and  passes  the  difference  to  the  n  +  1th  level. 
Since  the  sub-users  have  small  signal-to-interference-and-noise 
ratios,  binary  signaling  is  near  optimal  and  the  aforementioned 
argument  for  the  Gaussian  approximation  in  calculating  the 
capacity  holds.  Hence,  the  symmetric  capacity  is  closely  ap¬ 
proximated  by 


Gsuc,sd(TV )  -L,  It?)  — 


iV 


1  + 


anw 


1+^EL+ 1  <*m  +  (L-l)t 


where  the  maximum  is  taken  over  all  ai,...,orjv  such  that 

En  _ 

n=l  Gn  “ 

Defining  similarly  the  symmetric  capacity  ratio  as 

/  \t  r  \  Gsuc,sd(TV,  L,  tn) 

VsucMN,  L ,  W)  =  ’  \  ; 

t^sym  \E,  W  J 


we  have  that  limjv—oo  Vsuc,sd(N,  L,w)  —  1  for  all  L  and  w. 

Finally,  the  modified  successive  decoding  strategy  can  also 
be  extended  to  include  multirate  users,  where  each  user  uses 
different  subsets  of  the  sub-users  depending  on  its  desired  rate 
of  transmission. 
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Abstract  —  We  introduce  a  proper  framework  of 
coding  problems  for  a  quantum  memoryless  channel 
and  derive  an  asymptotic  formula  for  the  channel  ca¬ 
pacity  having  an  operational  significance.  Some  gen¬ 
eral  lower  and  upper  bounds  for  the  quantum  channel 
capacity  are  also  derived. 

I.  Introduction 

In  order  to  consider  a  communication  system  which  is  de¬ 
scribed  by  quantum  mechanics,  we  must  reformulate  informa¬ 
tion  (communication)  theory  in  terms  of  quantum  mechanical 
language.  However,  most  of  the  previous  works  [1]  seem  un¬ 
satisfactory  since  they  hastily  invoke  a  priori  analogy  between 
the  classical  and  the  quantum  communication  systems  based 
on  the  ostensible  similarity  of  various  quantum  entropies  to 
the  classical  ones.  One  of  the  reason  for  the  immaturity  of 
quantum  information  theory  lies  in  the  lack  of  asymptotic 
approaches,  although  there  are  a  small  number  of  excellent 
exceptions  such  as  [2].  The  purpose  of  this  paper  is  to  present 
a  proper  framework  of  coding  problems  for  a  quantum  mem¬ 
oryless  channel  and  to  derive  an  asymptotic  formula  for  the 
operational  channel  capacity  [3]. 

II.  Quantum  Channel 

We  here  restrict  ourselves  to  finite  dimensional  Hilbert  spaces 
and  to  generalized  measurements  which  take  values  on  finite 
sets  for  simplicity.  Letting  T(TCj)  be  the  set  of  states  on 
Hilbert  spaces  Tfy,  a  quantum  channel  for  an  input  system 
TCi  and  an  output  system  TC2  is  described  by  an  affine  map 
T  :  T(lHi)  — >  T(IH/2).  In  order  to  investigate  asymptotic  prop¬ 
erties,  we  consider  the  nth  extension  of  the  system  described 

n 

by  tensor  product  TC  =  D~C  (g)  •  *  •  0  TC.  This  extension 
corresponds  to  the  situation  where  the  sender  transmits  n 
states  successively,  which  is  represented  by  the  state 

n 

<?\  ®  •  •  •  0  <rn  on  0  .  The  extended  quantum  channel  for 

n  n 

extended  input  and  output  systems  0  !Xi  and  0  TC2  is  de¬ 
fined  by  an  affine  map  :  T(0  %\)  — *  T(0  %)•  Now,  a 
channel  is  called  memoryless  if 

r(n)  (ai  (8)  •  ■  ■  0  On  )  =  (T<Ji )  <g>  •  ■  •  ®  (IVn  ). 

Since  a  memoryless  channel  is  thus  determined  uniquely 
by  T,  we  often  drop  the  superscript  (n)  for  simplicity. 

III.  Quantum  Channel  Coding  Theorem 

n 

We  first  prepare  a  finite  set  of  quantum  states  on  0  TCi , 
called  the  quantum  codebook ,  C„  =  {cr^(l),  •  •  • ,  a^n\Mn)}, 
each  element  of  which  is  an  n-tensor  product  of  states  on  TCi : 
<7<n)(*)  =  (k)  0  •  ■  •  0  crn(k).  The  transmitter  first  selects  a 

codeword  cr ^  =  (T\  0  •  ■  •  0  an  which  corresponds  to  the  mes¬ 
sage  to  be  transmitted  (encoding),  and  then  transmits  each 
signal  (T\ ,  •  ■  ■ ,  an  successively  through  a  memoryless  channel 


T.  The  receiver  then  receives  signals  IVi ,  •  •  • ,  IV n  and,  by 
means  of  a  certain  measuring  process,  he  estimates  which  sig¬ 
nal  among  Cn  has  been  actually  transmitted  (decoding).  In 
this  case,  the  decoder  is  described  by  a  Cn -valued  measure- 

n 

ment  T ^  over  0  By  fixing  a  decoder  T ^  arbitrarily, 
the  error  probability  Pe(Qn^T{n))  averaged  over  the  code  be¬ 
comes  well-defined  in  the  classical  sense.  The  average  error 
probability  for  this  codebook  Pe(Cn)  is  defined  as  the  infimum 
of  that  over  all  possible  decoder  T^nK  Further,  the  quantity 
Rn  =  log  Mnjn  is  called  the  rate  for  the  code  Cn.  Consider 
sequences  of  codes  {Cn}n  which  satisfy  limn-^  Pe(Cn)  =  0, 
and  denote  the  supremum  of  hmn_^oo  Rn  over  such  sequences 
by  C(T),  which  is  called  the  capacity  of  the  channel  T. 

We  establish  the  relation  between  the  capacity  C(T)  and 
the  mutual  information.  By  fixing  arbitrarily  a  measurement 
II(n)  (the  totality  of  which  we  denote  by  97^ n^)  on  a  cetain 

n 

finite  set  (not  necessarily  Cn  -valued)  over  0  TC2,  we  have  the 
(classical)  mutual  information 

/<n)(p(n),n(n);r)  d4f  ^ynVn))£>n<">  (rv(n)||  r>(n)) . 

<y(n) 

Here  p^n\a^)  =  p^n\a\,  •  ■  • ,  an)  is  an  arbitrary  joint  dis¬ 
tribution  over  T(IKi)n  =  T(TCi)  x  ■  x  ?(%)  (the  total¬ 
ity  of  which  we  denote  by  *)J(n)),  Dn{n)  is  the  Kullback- 
Leibler  divergence  between  the  classical  probability  distribu- 
tions  Tr[(rV<n))II<n>(.)]  and  Tr [(rp(">)II(n>(-)],  and  p<n)  d= 
E<T,,...,,rn  i> '  ’ '  >  <7n)  <*\  ®  •  •  •  ®  an.  It  is  shown  that,  for 

a  memoryless  channel  T,  the  quantity 

c(n)(r)d=  sup  sup  /(n)(p(n),n(n);r) 

exhibits  the  superadditivity  C,(Tn+n)(F)  >  C^m^(r)  -J-C^(r), 
which  is  in  remarkable  contrast  to  the  classical  channel.  The 
following  theorem  gives  the  quantum  counterpart  of  channel 
coding  theorem: 

Theorem  1  For  a  memoryless  channel  I\ 

C(d=  iin£^  =  rap£^Q. 

n— +00  n  n  n 

The  quantum  channel  capacity  C(T)  is  compared  with  other 
capacity-like  quantities  to  obtain  general  lower  and  upper 
bounds:  C®( r)  =  C(1)(r)  <  C(r)  <  C(r),  where  C®(r)  is 
the  capacity  when  restricted  to  the  recursive  decoding,  C^(r) 
the  capacity  for  signals  of  unit  length,  and  C(r)  the  pseudo¬ 
capacity  defined  via  formal  quantum  mutual  information  [3]. 
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Abstract  We  examine  the  effect  of  a 
randomly  time-varying  channel  on  mutual 
information  between  receiver  and  sender  when 
the  channel  is  mth  order  Markov. 

We  investigate  the  effect  of  a  randomly  time-varying 
channel  upon  the  mutual  information  between  sender  and 
receiver.  Such  channels  otten  occur  in  mobile 
communications  and  can  affect  the  achievable  rate.  If  the 
channel  is  perfectly  known,  then  the  mutual  information 
between  a  receiver  and  an  arbitrary  number  of  senders  may 
be  found,  even  if  the  channel  is  time-varvtng  [1].  In  this 
paper  we  consider  the  case  of  a  single  receiver  and  sender 
pair  to  set  the  framework  for  the  more  interesting  multiple 
access  case. 

We  consider  a  discrete-time  matrix  model  for  our 
channel.  Let  random  variable  S[i]  denote  the  input  at  time 
i,  Y[j]  the  output  at  time  j,  NO]  the  additive  white  Gaussian 
noise  at  time  j  and  G[j,i]  the  multiplicative  effect  of  the 
channel  on  the  output  at  time  j  due  to  the  input  at  time  i 
(the  channel's  tap  at  time  j  corresponding  to  a  delay  of  j-i). 
The  channel  is  assumed  to  be  causal  and  have  finite 
memory  limited  to  A  time  samples,  therefore  G[j,i]  is  zero 
for  j-i>A  and  j-i<0.  Let  us  assume  that  S[i]  for  any  i<0  is 
zero.  Let  a  subscript  on  a  random  variable  indicate  the 
vector  of  random  variables  from  times  1  to  k,  a  double 
subscript  on  G  the  corresponding  matrix  and  a  single 
subscript  k  on  G  indicate  that  we  are  considering  the  kth 
row  of  Gfc  k‘  Gk  k  block-diagonal  and  G|  is  given  by 

[0 . G[i  i-A] .  G[M-1],  G[u],  ....  0].  The  effect  of 

the  channel  is  given  by 
Yk=GkJtSk+Nk 

m. 

Let  us  take  the  channel  to  be  such  that  any  row  of  G  ^  ^ 

depends  on  at  most  m  preceding  rows,  i.e.  that  the  i1*1  row 
conditioned  on  rows  i-1  through  i-m  is  independent  of  row 
i-m-1.  In  steady-state,  it  is  equivalent  to  stating  that  the 
ith  row  conditioned  on  rows  i+1  through  i+m  is 
independent  of  row  i+m+1.  Such  a  model  is  that  of  a  m^ 
order  Markov  chain.  Under  some  conditions  of  wide-sense 
stationarity,  we  may  state  that 


<limi^00(l(Gi;SiIYi,{Gi+1...Gi+m})) 


[21 

where  both  the  RHS  and  LHS  reach  a  limit.  The  LHS 
represents  the  loss  incurred  by  not  knowing  the  channel 
and  the  RHS  is  the  information  that  the  input  gives  about 


the  rate  of  change  of  the  channei. 

Suppose,  as  a  special  case,  that  we  can  describe  the 
channel  by  a  Gauss-Markov  model.  We  assume  that,  at  any 
time,  the  taps  of  the  channel  are  mutually  independent  and 
that  the  expected  energy  of  the  tap  corresponding  to  a 
given  delay  does  not  change  in  time.  Let  Tc  be  the 
coherence  time  of  the  channel,  roughly  the  inverse  of  the 
Doppler  spread.  Let  Ts  be  the  time  spread  of  the  channel. 

proportional  to  A.  The  channel  may  be  modelled  as 
becoming  decorrelated  in  time  exponentially  with  rate 
inversely  proportional  to  Tc.  We  may  write  that: 

GUi]=  a  G|  j-U-l]  +  E[j,i] 

[3] 

where  a  is  1/WTC.  Since  the  expected  energy  remains 


unchanged,  the  expected  energy  of  E[j,i]  is  proportional  to 
(l-a~).  We  send  a  white  Gaussian  signal. 

We  mav  show,  for  the  channel  model  described  above. 


limn 
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limk 
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[4]. 


The  smaller  A.  i.e.  the  less  dispersive  in  time  the 
channei,  the  faster  the  LHS  of  [4]  goes  to  0.  The  LHS 
depends  both  on  the  coherence  time  Tc  ,  i.e.  on  how  fast 
the  channel  decorrelates,  and  on  the  coherence  bandwidth, 
which  is  inversely  related  to  Ts . 

The  above  arguments  can  be  extended  to  the  multi¬ 
dimensional  case.  They  give  some  idea  of  the  effect  of  an 
imperfectly  known  channei  upon  interference 
cancellation.  In  (2),  for  spread-spectrum  systems,  when 
the  input  is  perfectly  decoded,  the  effect  of  the  channel 
measurement  error  on  interference  cancellation  is  bounded. 
These  results  should  also  give  some  indication  as  to  the 
usefulness  of  feedback.  When  Tc  is  large,  the  mutual 
information  can  be  increased  by  optimizing  the  input 
distribution  for  the  user  appropriately. 
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Abstract  —  We  look  at  the  problem  of  transmitting 
information  over  time-varying  channels  with  side  in¬ 
formation,  where  for  time- varying  channels  the  statis¬ 
tics  of  the  channel  change  with  time  and  by  channel 
side  information  we  mean  the  current  state  of  the 
channel.  We  show  that  when  this  side  information 
is  available  at  both  the  transmitter  and  the  receiver, 
then  for  the  power-constrained  channel,  the  power  al¬ 
location  policy  that  achieves  minimum  end-to-end  dis¬ 
tortion  is  not  necessarily  the  same  as  the  one  required 
for  maximum  transmission  rate. 

I.  Introduction 

A  new  challenge  in  telecommunication  is  the  transmission  of 
information  over  time- varying  channels  where  the  statistics  of 
the  channel  change  with  time.  Examples  of  such  time- varying 
channels  are  wireless  links  where  due  to  multi-path  fading  and 
interference  from  other  users,  the  received  signal  strength  can 
vary  within  a  few  orders  of  magnitude.  Traditionally,  the  pre¬ 
ferred  transmission  method  has  been  to  make  the  channel  be¬ 
have  or  look  like  a  channel  with  uniformly  distributed  error 
-  e.g.  through  use  of  interleaving.  Achieving  this,  then  the 
problem  of  communication  is  no  harder  than  it  used  to  be 
and  all  the  classical  methods  and  tools  can  be  used.  It  is 
well-known  that  this  “average  channel”  method  is  inherently 
sub-optimal  [l][2].  However,  to  achieve  higher  channel  capac¬ 
ity,  it  is  required  to  provide  channel  state  side  information  to 
either  the  transmitter  or  the  receiver. 

II.  Time-varying  channels  with  side  information 

We  consider  the  state  process  with  sample  space  1  where  at 
each  time  instant  the  channel  is  at  one  of  these  states  and 
hence  has  different  statistics.  For  example,  consider  an  AWGN 
channel,  where  the  noise  power  is  modulated  in  accordance 
with  the  channel  state.  Based  on  the  availability  of  the  cur¬ 
rent  channel  state  side  information,  we  can  distinguish  the  fol¬ 
lowing  four  different  cases:  (I):  Informed  receiver  and  trans¬ 
mitter,  (II):  Informed  receiver,  (III):  Informed  transmitter 
and  (IV):  Average  channel.  In  this  paper,  we  concentrate  on 
case  I.  Note  that  providing  the  current  channel  state  does  not 
imply  a  knowledge  about  the  distribution  of  the  states.  In 
fact,  we  assume  that  neither  the  receiver  nor  the  transmitter 
is  aware  of  this  distribution.  It  is  well-known  that  the  capac¬ 
ity  of  the  channel  is  given  by  C  =  J2tex  Y*)  where  qt 

is  the  probability  of  the  channel  being  at  state  i  and  I(Xt,  Yi) 
is  the  mutual  information  between  the  channel  input  and  out¬ 
put  processes  at  this  state.  Note  that  the  policy  that  achieves 
this  capacity  is  independent  of  the  channel  state  distribution 
(qi).  Also  since  the  distribution  of  the  states  is  unknown,  the 
capacity  of  the  channel  is  also  not  known.  By  policy,  here, 
we  mean  the  distribution  of  the  input  channel  alphabets  that 
maximizes  I(Xt,Yi). 

We  can  then  show  that  the  minimum  end-to-end  distortion 
is  given  by: 

Dm  =  J^qtD(I(X„Yt)).  (1) 

*€Z 


Fig.  1:  Performance  of  minimum  distortion  and  maximum  capacity 
policies  vs.  j3  (D(R)  =  over  a  narrow-band  Rayleigh  fading 

channel. 

Note  that  had  the  channel  state  distribution  been  also  pro¬ 
vided  to  the  transmitter  then  the  channel  capacity  would  have 
been  known  and  Dm  =  D  (J2i€i  <nHXi,  K)).  In  the  follow- 
ing  section,  we  look  at  the  power-constrained  channels  and 
show  that  the  power  allocation  policy  that  achieves  minimum 
end-to-end  distortion  is  not  necessarily  the  same  as  the  one 
required  for  maximum  transmission  rate. 

III.  Power-constrained  Channel 

We  are  considering^  channels  with  constraint  on  the  average 
transmitted  power  5  =  y~V£T  qt St  where  Si  is  the  transmission 
signal  power  at  state  i.  Moreover,  we  characterize  the  channel 
states  based  on  the  received  signal  to  noise  ratio  (7).  It  is 
then  straightforward  to  show  that  the  following  policy  results 
in  channel  capacity:  S(j)/S  =  1/7C  -  1/7  if  7  >  7c  and 
0  otherwise  [2],  where  7C  is  the  cut-off  signal  to  noise  ratio 
which  is  set  so  that  the  constraint  on  average  signal  power  is 
met.  If  we  now  assume  that  the  source  has  the  distortion  rate 
function  D(R)  —  2-/3R  then  the  optimum  policy  that  results 
in  minimum  end-to-end  distortion  is  given  by: 


7) 

s 


7  7  >  7  c 

7  <  7c 


(2) 


which  is  dependent  on  the  source  through  /3  [3].  In  fact,  the 
more  convex  the  distortion-rate  function  (the  higher  the  value 
of  f3)  the  more  dissimilar  the  above  policies  become.  Figure 
1  shows  the  performance  of  these  two  policies  over  narrow- 
band  Rayleigh  fading  channel  where  the  received  SNR  7  has 
exponential  distribution  (/( 7)  =  1/7,  exp(-7/73)). 
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Abstract  —  We  extend  Partial  Response  (PR)  pre- 
coding  [1]  to  two-dimensions  and  consider  it,  as 
well  as  parallel  one-dimensional  (ID)  PR,  for  use  in 
parallel  readout  optical  memory  systems.  We  also 
develop  expressions  for  optically  implementabie 
two-dimensional  (2D)  zero-forcing  equalizers  to  be 
used  in  conjunction  with  these  forms  of  PR 
precoding. 

I.  SUMMARY 

Figure  1  depicts  a  behavioral  model  of  an  array  of  abutted  rectan¬ 
gular  pixels  being  retrieved  in  parallel  from  a  memory  with  a  co¬ 
herent  imaging  readout  system.  The  transfer  function,  H(fx,  fy), 

describes  the  2D  spatial  bandlimiting  of  the  readout  system  and 
also  includes  a  frequency  description  of  the  shape  of  the  pixels  in 
the  memory. 


Figure  1 :  Model  of  pixels  being  retrieved  in  parallel. 


Figure  2  shows  the  reconstruction  of  an  array  of  binary  phase 
pixels  (with  values  +1  and  -1)  that  have  been  precoded  using  what 
we  term  ID  (1+D)  PR  precoding.  ZERO  values  are  formed  by  the 
overlap  of  two  pixels  with  opposite  signs.  ONE  values  are 
obtained  by  the  overlap  of  pixels  with  the  same  sign.  Detection 
of  intensity  takes  place  halfway  between  the  centers  of  the  two 
pixels  used  to  form  the  desired  data  value.  ID  strips  can  be  read 
out  in  isolation  [2]  or  can  comprise  the  rows  or  columns  of  a  2D 
array. 

Pixels  stored  in  memory: 

— j  +i  '+1 1  -i ,  -i  r+n  -i .  -i  i +p— 

Pixels  retrieved  in  parallel  (Intensity): 


Figure  2:  Example  of  ID  (1+D)  parallel  PR  signaling 

2D  arrays  can  also  be  precoded  using  2D  PR  precoding.  With  2D 
(1+D)  PR  precoding,  each  data  value  is  formed  at  the  center  of 
four  overlapping  reconstructed  pixels  as  illustrated  in  Figure  3 . 
This  form  of  precoding  can  be  applied  to  2D  arrays  that  experi¬ 
ence  spatial  bandlimiting  or  to  ID  arrays  read  out  in  succession 
that  are  broadened  temporally.  We  introduce  two  shift  operators 
Z)  and  Dv  to  describe  this  2D  broadening.  With  this  notation, 

x  y  . 

the  system  polynomial  for  a  reconstructed  pixel  broadened  in 

Nx-lNy-l  m  ' 

two -dimensions  can  be  written  as:  X  H^ijDlxDJy 

i=0  j-0 

To  accomplish  2D  PR  precoding,  a  2D  array  can  be  thought  of  as 
a  ID  array,  precoded  as  such,  and  then  returned  to  its  2D  format. 


^This  research  was  supported  by  ONR  grant  #N00 14-93 -1-04 14  and  by 
the  AFOSR  under  grants  #F49620 -93 -1-0057  and  #F49620-93-I-0371. 


Array  to  be  precoded:  Unwrapped  impulse  response: 


Mmx 

Figure  3:  (1+Dx)(l+Dy)  PR  precoding  example. 


Towards  this  end,  the  2D  system  polynomial  is  made  into  a  ID 
system  polynomial  by  substituting  D  for  Dx  and  D ^  for  Dy) 
where  M  is  (Ny-1 )  plus  the  number  of  bits  in  each  row  of  the  2D 
array  to  be  precoded.  For  (1+Dx)(l+Dy)  PR  precoding  performed 
serially,  one  would  arbitrarily  choose  the  first  row  of  the  pre- 

coded  array  and  the  first  bit  of  the  next  row.  (1+D+D^ +D^+I) 
PR  precoding  is  then  applied  to  bits  read  from  the  2D  array  to  be 
precoded,  as  described  above.  After  precoding  M-l  bits,  an  addi¬ 
tional  arbitrary  bit  would  be  inserted  in  the  input  bit  stream  of 
the  precoder  to  start  a  new  row. 

Zero-forcing  equalizers  for  both  ID  and  2D  PR  signaling  applied 
to  2D  arrays  of  binary  phase  pixels  are  easily  represented  in  the 
Fourier  domain  by  extending  the  work  in  reference  [3].  These 
equalizers  can  be  implemented  as  apodizers  in  the  Fourier  plane 
of  an  optical  system.  Table  1  lists  the  equalizers  and  overall 
transfer  functions  for  minimum  bandwidth  one-to-one  imaging 
systems  using  (1+DX)  and  (1+Dx)(l+Dy)  signaling. 


Signaling 

Transfer  Function 

Equalizer 

(1+DX) 

cos(rfx)rect(fx) 

cos(nf  x)rect(f  x) 
sinc(  fx  )sinc(  f  ) 

(1+Dx)(l+Dy) 

cos(nfx)cos(vf  ) 

COS(  TTjf x)  COS  (  Tlf 

sinc(  f x)sinc(  f  ) 

Table  1:  Phase  terms  are  omitted.  This  is  compensated  for  by  de¬ 
tecting  between  the  centers  of  reconstructed  pixels 
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Abstract -  A  novel  technique  for  trellis  decoding  of 
block  both  RLL  and  balanced  codes  on  PR  channels  is 
described.  The  technique  allows  performance 
improvement  without  increament  of  decoder 
complexity. 

1.  INTRODUCTION 

Recently,  a  simple  technique  for  constructing  run  length 
limited  (RLL)  block  error  control  codes  (ECC)  together 
with  their  minimal  trellises  has  been  introduced  [1,2]. 
The  procedure  adapted  for  the  design  of  such  codes  is 
based  on  taking  a  linear  ECC  and  incorporating  a 
maximum  runlength  constraint  by  carefully  modifying 
the  basis  code  while  retaining  the  minimum  distance 
properties  of  the  parent  code.  Such  codes  are 
particularly  suited  for  magnetic  recording  applications 
where  the  (1-D)  partial  response  (PR)  channel  provides 
a  good  model  at  low  information  density  rates  [3]. 

In  this  paper  we  show  that  the  trellis  decoder  of  these 
codes  has  the  same  trellis  structure  as  the  encoder,  thus 
the  additional  decoding  complexity  is  avoided.  We  also 
describe  how  the  trellis  diagram  of  the  non-linear 
balanced  codes  can  be  incorporated  within  the  PR 
channel. 

2.  DECODING  OF  LINEAR  BLOCK  CODES  ON 
PARTIAL  RESPONSE  CHANNELS 

When  binary  sequences  are  transmitted  over  the  (1-D) 
PR  channel  the  received  noiseless  sequence  is  ternary 
and  due  to  the  memory  of  the  PR  channel  contains 
some  additional  structure  that  can  be  exploited  to 
improve  the  error  performance.  For  uncoded  data,  MLD 
in  PR  channel  can  be  realised  by  using  a  Viterbi 
decoder,  because  the  memory  introduced  by  the  (1-D) 
channel  has  a  trellis  structure  [4].  Similarly,  for 
RLL/ECCs  MLD  can  readily  be  achieved  by 
incorporating  the  trellis  diagram  of  the  code  within  the 
decoding  trellis  of  the  PR  channel.  Furthermore,  the 
complexity  of  the  trellis  does  not  increase  because  the 
modified  RLL  codes  have  an  odd  number  of  Is  in  the 
labelling  of  each  branch  of  the  trellis  and  hence  the 
state  of  the  PR  channel  is  the  same  for  all  branches 


emanating  from  the  same  state.  Thus  all  that  need  to  be 
changed  is  the  branch  labels  of  the  RLL/ECC  trellis. 

3.  SIMULATIONS  RESULTS  AND 
CONCLUSIONS 

The  effect  of  this  technique  on  the  decoder  performance 
for  some  codes  has  been  derived  by  the  comparison  of 
simulation  results  for  the  new  decoding  strategy  with 
the  conventional  approach.  It  has  been  found  that  the 
technique  provides  a  performance  improvement 
exceeding  2  dB  at  error  rates  of  about  1 0"5 . 

The  technique  has  also  been  applied  for  the  balanced 
codes.  Although  these  codes  are  non-linear,  they 
possess  a  regular  trellis  structure  which  allows  their 
Viterbi  decoding.  The  technique  has  been  applied  to  the 
(16,9,6,5,4)  code  [5]  and  simulation  results  have  proved 
the  efficiency  of  the  technique. 
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Abstract  —  We  investigate  sets  of  maximal  number 
of  fixed-length  sequences  that  can  be  concatenated 
without  violating  certain  constraints  often  required  in 
class-IV  PRML  channels  used  in  magnetic  recording. 

I.  Introduction 

PRML  is  a  technique  that  combines  partial-response  (PR)  sig¬ 
naling  with  maximum-likelihood  (ML)  sequence  estimation  in 
order  to  combat  intersymbol  interference  and  noise,  which  are 
common  in  high  density  digital  magnetic  storage  channels  [3]. 
One  of  the  most  common  partial  response  channels  used  in 
magnetic  recording  is  the  class-IV  channel.  Such  channel  pro¬ 
cesses  independently  the  even  and  the  odd  subsequences  of 
bits  with  even  and  odd  indices,  respectively.  Hence,  a  Viterbi 
detector  can  be  separately  applied  to  each  of  the  even  and 
the  odd  output  subsequences  to  obtain  maximum-likelihood 
estimates  of  the  input  subsequences. 

In  order  to  limit  the  path  memory  of  the  Viterbi  detector, 
the  number  of  consecutive  zeros  in  each  of  the  input  subse¬ 
quences  is  upper  bounded  by  some  positive  integer  7.  Also 
to  maintain  clock  synchronization,  the  number  of  consecutive 
zeros  in  the  global  input  sequence  is  upper  bounded  by  some 
positive  integer  G.  We  say  that  a  binary  sequence  satisfies  the 
(0,G/7)  constraint  if  it  satisfies  the  two  constraints  specified 
by  G  and  7.  Notice  that  if  the  number  of  consecutive  zeros 
in  each  of  the  even  and  the  odd  subsequences  of  a  sequence  is 
upper  bounded  by  7,  then  the  number  of  consecutive  zeros  in 
the  sequence  itself  is  upper  bounded  by  27.  Hence,  we  assume 
in  the  following  that  G  and  7  are  positive  integers  such  that 
G  <  27.  Coding  schemes  are  used  to  map  unconstrained  se¬ 
quences  of  data  into  (0,G/7)  constrained  sequences  for  trans¬ 
mission  over  the  channel  [2], [3].  In  this  paper,  we  consider 
schemes  based  on  block  codes. 

II.  (0,G/7)  Constrained  Block  Codes 
A  (0,G/7)  block  code  is  a  set  of  (0,G/7)  constrained 
binary  sequences,  called  codewords,  of  fixed  length  such 
that  any  juxtaposition  of  a  finite  number  of  codewords 
is  also  (0,G/7)  constrained.  For  given  G  and  7,  let 

i2  n  r2(n)  he  the  set  of  all  (0,  G/7)  constrained  se¬ 
quences  (71, 72)  • .  • ,  7n)  of  length  n  with  at  most  , 

and  h  leading  zeros  at  the  beginning  of  the  sequence 
(7i,T2>---.7»),  its  odd  subsequence  (71, 73,  •••,  72^/21-1 ), 
and  its  even  subsequence  (72,  74, ,  72|n/2j )?  respectively, 
and  at  most  r,  r  1,  and  T2  leading  zeros  at  the  beginning 
of  the  reversed  sequence  (7n,  7n-i ,  •  •  • ,  7i ),  its  odd  subse¬ 
quence  (7^ >  7n— 2 1  •  "  i  7[n / 2 J  —  \n /2f  -f 2 ) >  and  its  even  subse¬ 
quence  (7n-i,7n-3,  •  •  •  ,7|-n/2-|  — |n/2j+i),  respectively.  Let 

M!Gr’|Ll2;r1,r2(n)  be  the  cardinality  of  M ,l2;ri  ,r2  (n)'  Any 
(0,G/7)  constrained  block  code  of  length  n  is  a  subset  of 
1-2-1-1-2  i-h  (n)  ^or  some  ^  *i>  and  h-  Conversely, 

1Most  of  this  work  was  done  while  the  first  author  was  visiting 
the  Dept,  of  Elec.  Eng.,  Delft  Univ.  of  Tech.  The  first  author  was 
also  supported  in  part  by  NSF  under  grant  NCR  91-15423. 


if  n  is  sufficiently  large,  any  subset  of 

forms  a  (0,  G/7)  constrained  block  code.  Hence,  to  con¬ 
struct  efficient  (0,G/7)  block  codes  of  length  n,  it  is  im¬ 
portant  to  determine  an  option  (l,h,h)  that  maximizes 
^^G-i\ix  h’J-h  i-ii  (n)‘  option  will  be  called  optimal 

for  the  given  G,  7,  and  n.  Two  special  cases  were  investigated 
by  Eggenberger  and  Patel  [1].  They  determined  that  the  op¬ 
tion  (2,2,2)  is  optimal  in  case  G  =  7  =  4  and  n  =  9,  while 
the  option  (1, 3,  3)  is  optimal  in  case  G  =  3,  7  =  6,  and  n  =  9. 

III.  Results 

The  main  contribution  of  this  paper  is  presenting  general  re¬ 
sults  concerning  optimal  options  for  all  values  of  G,  7,  and  n. 
The  results  are  given  in  the  following  three  theorems  which 
address  the  cases  G  =  1,  G  >  2  and  7  is  even,  and  G  >  2  and 
7  is  odd,  respectively. 

Theorem  1  For  G  =  1,  7  >  1,  and  n  >  1,  (0,  0,  [7/2J)  is  an 
optimal  option. 

Theorem  2  For  2  <  G  <  27,  7  is  even,  and  n  >  1, 
([G/2J,  7/2, 7/2)  is  an  optimal  option ,  except  in  the  case 
G  =  I  =  2  and  n  =  6  where  the  option  (0,0,1)  is  optimal. 

Theorem  3  For  2  <  G  <  27,  7  is  odd,  and  n  >  1,  at  least 
one  of  the  following  three  options  is  optimal: 

(min{[G/2j,  /  —  1},  (J  —  l)/2,  (I  —  l)/2), 

(min{  [G/2J,  /  -  1},  (I  -  l)/2,  (I  +  l)/2), 
(LG/2j,(J+l)/2f(/-l)/2). 

Theorems  1  and  2  explicitly  specify  an  optimal  option  in  case 
G  =  1  or  7  is  even.  In  particular,  the  results  of  Eggenberger 
and  Patel  follow  as  special  cases  of  Theorem  2.  In  case  G  > 
2  and  7  is  odd,  our  results  specify  three  candidates  for  an 
optimal  option.  In  general,  the  optimal  options  in  this  case 
may  depend  very  much  on  the  length  n  as  demonstrated  in 
the  following  result. 

Theorem  4  For  G  =  2,  7  =  1,  and  n  >  1,  (1,1,0)  is  an 
optimal  option  if  n  =  1  or  n  ^  1  (mod  4),  and  (0,  0,  0)  is  an 
optimal  option  if  n  ^  1  and  n  =  1  (mod  4). 
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Abstract  —  Let  S(N ,  q)  be  the  set  of  all  words  of  length 
N  over  the  bipolar  alphabet  {-l,+l},  having  a  g-th  order 
spectral-null  at  zero  frequency.  Any  subset  of  S(N,  g)  is 
a  spectral-null  code  of  length  N  and  order  g.  In  this  pa¬ 
per,  we  give  an  equivalent  formulation  of  S(N,  q)  in  terms 
of  codes  over  the  binary  alphabet  {0,1}.  We  show  that 
S(N,  2)  is  equivalent  to  a  well  known  class  of  single  error 
correcting,  all  unidirectional  error  detecting  (SEC-AUED) 
codes.  We  derive  an  explicit  expression  for  the  redundancy 
of  S(N,  2).  Further,  we  give  new  efficient  recursive  design 
methods  for  second-order  spectral-null  codes,  improving 
the  redundancy  of  the  codes  found  in  the  literature. 

Regard  the  alphabet  {— 1,+1}  as  a  subset  of  the  real  field.  The 
following  characterization  of  S (TV,  g)  is  well  known  [7],  [5]  ( xi  denotes 
the  2-th  component  of  a  vector  A ): 


S(N,  q)  =  jx  €  {-1, +1}N:  £ 

l  J  =  1 


The  problem  of  finding  an  explicit  expression  for  the  redundancy  of 
S(N,  q)  was  left  open  in  [7].  Using  a  well  known  result  in  number 
theory  (the  problem  of  partitioning  a  natural  number  n  into  w  dis¬ 
tinct  natural  numbers  less  than  or  equal  to  a  certain  bound  6),  we  are 
able  to  derive  the  following  explicit  expression  for  the  redundancy 
of  S(N,  2): 

N  —  [log2  \S(N,  2)  |J  ~  2log2  TV  —  1.141,  TV  multiple  of  4.  (1) 

Further,  by  replacing  the  symbol  —1  with  0  and  +1  with  1  we  prove 
that  <S(TV,  q)  is  equivalent  to  the  code 


S(N,q) 


xe{0,i}": 


*j3 


3  = 1 


N 


j=l 


where  the  sums  and  the  products  are  over  the  real  field.  Since 
i  XJ31  ls  an  Integer  number,  if  S(TV,g)  ^  0  then  Y,N-i  j*  must 
be  even  for  all  i  =  0, . . . ,  g  —  1.  Note  that 

S(N,2)  =Le{0,l}":  f  -nd  - 

l  J= 1  3=1  J 

This  is  nothing  but  a  particular  group  theoretic  single  error  correct¬ 
ing  and  all  unidirectional  error  detecting  (SEC-AUED)  code  over 
(ZZ,  +)  [3].  Clearly,  if  N  is  not  a  multiple  of  4  then  5(TV,  2)  =  0.  A 
binary  code  C  is  a  g-th  order  spectral- null  code  of  length  TV  with  k 
information  bits  iff  1)  C  is  a  subset  of  «S(TV,  q)  and  2)  C  has  2*  code¬ 
words.  The  authors  in  [7],  presented  a  recursive  method  to  encode 
k  information  bits  into  a  second-order  (g  =  2)  spectral-null  code  of 
length 

N(k)  =  n(k)  -|-  TV( 2  [log2  n(A;)]  -  1),  (2) 

where  h(k)  is  the  smallest  integer  h  such  that  1) 

n~  k>  flog2  n  \  +  1.  (3) 

and  2)  n  is  a  multiple  of  4.  Here,  we  give  a  new  efficient  reclusive 
method  to  encode  k  information  bits  into  a  second-order  spectral- 
null  code  (over  the  alphabet  {0,1})  of  length 

N(k)  =  n(k)  +  JV(flog2(n(A)  ■  (n(k)  -  1))1  -  1),  (4) 

where  n(k)  is  the  smallest  integer  n  such  that  1)  There  exist  a  first- 
order  spectral-null  code  of  length  n  with  k  information  bits  and 

2)  n  is  a  multiple  of  4.  Note  that,  a  first-order  spectral-null  code 
is  nothing  but  a  balanced  code  [6].  At  present,  there  exist  many 
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efficient  balanced  code  designs  which  require  less  than  log2  k  check 
bits  (i.e.  n  —  k  <  log2  k)  to  make  a  k  bit  data  word  balanced  [l],  [2], 

[4],  [6],  [8],  and  so  n(k)  >  n(fc)  for  infinitely  many  values  of  k  (see 
(3)).  Comparing  relations  (2)  and  (4)  it  is  then  clear  that,  for  these 
k1 s,  we  get  less  redundant  codes  than  those  presented  in  [7].  In  our 
design  methods,  first,  the  data  word  is  converted  into  a  balanced 
word,  which  in  turn  is  converted  into  a  second- order  spectral- null 
codeword.  One  of  the  proposed  methods  is  briefly  described  here. 
Let  n  be  a  multiple  of  4.  Given  A  €{0,l}n,  let  s(A)  =  Yrj-i  xj3 

and  w(A)  =  Xy  For  i  =  0 ,...,71(71  —  l)/2,  let  AU)  be  the 

binary  vector  obtained  from  A  by  applying  the  first  i  exchanges  of 
adjacent  components  starting  from  the  first  component.  For  exam¬ 
ple,  when  n  =  4,  A^0)  =  A  =  X\  X2  X3  X4  ,  xW  =  X2X\ 0:3374 ,  *(2)  = 

X2XZX1X4,  X(3)  =  X2X3X4.XI  ,  x(*)  =  Xs  X2X4X1  ,  x(s)  =  X3X4X2XI , 

x(6)  =  x±xzX2X\ .  A  data  word  Y  €{0,1}*  is  encoded  as  follows. 
Encoding  Procedure: 

1)  Balance  Y  using  one  of  the  methods  given  in  [l],  [2],  [4],  [6],  [8]. 
Let  A  be  the  codeword  of  length  n  =  n(k)  associated  with  Y. 
Note  that  it; (A")  =  n/2. 

2)  Compute  A(*°),  where  z'o  is  an  integer  i  €  [0,n(n  —  l)/2  —  l]  (for 
example  the  smallest)  such  that  5(A^))  =  71(71  +  l)/4. 

3)  Recursively  apply  this  encoding  procedure  to  the  binary  repre¬ 
sentation  of  to-  Let  E(io)  be  the  codeword  associated  with  ?o. 

4)  Concatenate  E(i0)  to  A(lo)  to  get  the  codeword  E(F)  = 

A(*°)£(z0). 

Decoding  of  the  received  word  A  I  can  be  done  easily  once  it  is 
known  that  2*0  —  E“1(/).  In  the  paper,  we  give  similar  procedures 
which  require  only  0(nlog2  71)  bit  operations. 

Example:  Let  k  =  32.  Using  the  second  construction  proposed  in 
[8],  it  is  possible  to  encode  the  32  data  bits  into  a  balanced  code 
of  length  n  —  7i(32)  =  36.  In  this  case,  the  length  of  the  code  is 
N(32)  =  36+ N(  flog2(36-35)]  —  1)  =  36  +  7V(10).  Assume  that 

X  =  011010111011110000000000000011111111 

is  the  balanced  encoding  of  a  data  word  F€{0,1}32  that  needs  to 
be  encoded.  Since  28  is  the  smallest  integer  i  such  that  iS(AU))  — 
71(71  f-  l)/4  =  36  *  37/2  =  333,  then  Y  is  encoded  as  E(Y)  — 

A(28)E(28)  =  110101110111100000000000000101111111  E{  28). 

Using  a  table  look-up  it  is  possible  to  encode  10  information  bits 
into  a  second-order  spectral-null  code  of  length  20  (see  ( 1 )).  This 
means  that  we  can  encode  32  data  bits  into  a  second- order  spectral- 
null  code  of  length  N(32)  =  36  +  iV(10)  =  36+  20  =  56  (instead  of  60 
which  is  what  we  would  get  using  the  method  proposed  in  [7]). 
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Abstract  —  In  this  paper,  we  propose  a  Viterbi  de¬ 
coding  based  on  Levenshtein  distance.  We  show  that 
Levenshtein  distance  is  suitable  for  metric  in  a  channel 
where  both  substitution  errors  and  insertion/deletion 
errors  occur.  Our  proposal  makes  it  possible  to 
continue  Viterbi  decoding  without  re-synchronization 
even  if  some  insertion/deletion  errors  occur  in  a  chan¬ 
nel. 

I.  Introduction 

Recently,  high  recoding  density  is  required  in  digital  record¬ 
ing  systems.  However,  as  recording  density  increases,  error 
rate  also  increases.  Especially,  it  is  known  that  burst  errors 
called  a  synchronization  error  caused  by  insertion/deletion  er¬ 
rors  occur  in  optical  recording  systems  more  frequently  than  in 
other  disk  systems.  In  this  field,  Partial  Response  Maximum 
Likelihood  (PRML)  detection  is  focused  on  now.  In  PRML 
systems,  Viterbi  decoding  is  used  in  order  to  realize  maximum 
likelihood  decoding.  In  Viterbi  decoding  based  on  Hamming 
or  Euclidean  distance,  even  if  just  an  insertion/deletion  er¬ 
ror  occurs,  it  is  impossible  to  continue  decoding  without  re- 
synchronizing,  because  insertion/deletion  errors  measured  by 
Hamming  or  Euclidean  distance  cause  a  burst  error  called  syn¬ 
chronization  error.  In  this  paper,  we  propose  Viterbi  decoding 
based  on  Levenshtein  distance  [1]. 

II.  Channel  Model 

In  this  section,  we  talk  about  a  binary  channel  model  in 
which  not  only  substitution  errors  but  also  insertion/deletion 
errors  occur.  In  this  paper,  we  call  such  a  channel  Csid •  Let  p 
be  the  probability  of  substitution  errors,  q*  be  the  probability 
of  insertion  errors  and  q<j  the  probability  of  deletion  errors  in 
Csid  respectively.  In  this  paper,  for  convenience,  we  assume 
that  q{  =  qd  =  q. 

III.  Levenshtein  Distance 

Definition  X  Let  x  and  y  be  two  finite  sequences  of  symbols 
from  a  given  alphabet.  If  x  can  be  transformed  into  y  by  the 
substitution  of  E{  symbols,  the  insertion  of  fi  symbols  and 
the  deletion  of  gi  symbols .  then  the  Levenshtein  distance  (LD) 
between  x  and  y  is  defined  by 

LD(x ,  y)  =  min  (E,  +  fi  +  gi).  (1) 

i 

Notice  that  Levenshtein  distance  satisfies  three  axioms  of  met - 
Tic. 

Levenshtein  distance  is  computed  by  using  a  graph  that  we 
call  a  LD  diagram  (See  Figl). 

This  work  was  presented  in  part  at  the  IEICE  Technical  Report, 
July  15,  1995. 


Figure  1:  LD  diagram  for  n  =  5 

IV.  Conditional  Probability 

In  this  section,  we  consider  the  conditional  probability 
P(Vt\wt)  in  Csid .  In  Binary  Symmetric  Channel  (BSC), 
the  conditional  probability  P(yt\wt)  is  given  by  the  follow¬ 
ing  equation. 

P(yt\wt)  =  pE{l-p)n-E  (2) 

where  E  is  the  number  of  substitution  errors  that  occur  in 
BSC.  In  this  case,  —  log  P(yt  |u>*)  is  proportional  to  the  num¬ 
ber  of  substitution  errors,  that  is,  Hamming  distance 

In  Csid  ,  there  are  many  way  that  Wt  changes  yt  by  both 
substitution  errors  and  insertion/deletion  errors.  The  number 
of  ways  that  Wt  changes  yt  is  given  by  the  number  of  paths 
in  LD  diagram.  Thus,  the  conditional  probability  P(yt\wt) 
is  given  by  the  sum  of  the  probability  of  each  path  in  LD 
diagram.  Thus, 

P(VtWt)  =  ^p£i(i -p)'~Eiqfiq9i  (3) 

* 

where  0  <  i  <  22«Cn,  k  =  fi  =  gi,  l  =  n  —  k ,  and  El,  fi  and 
gi  are  the  number  of  substitution  errors,  insertion  errors  and 
deletion  errors  in  each  path,  respectively.  In  this  case,  what  is 
proportional  to  -  log  P(yt  |wt)  ?  Let  di  be  d{  —  Et  +  fi  +  p,-, 
and  pt  be  a  path  labeled  i,  and  P(pi)  ~  m  be  the  probability 
of  pi.  Assume  that  di  is  the  minimum  value  for  all  i.  Here, 
we  consider  P(pj). 

In  the  case  that  Ej  =  Ei  +  1  and  dj  =  di  +  1,  then 
P(pj)  <  pm.  In  the  case  that  f3  =  fi  -}-  1, g3  —  gi  -f  1  and 
dj  —  di  +  2,  then  P{p3)  <  (fi  ra.  Thus,  if  p,  q  are  relatively 
small,  P(pj)  <C  P(pi).  Thus,  it  can  be  said  that  the  value  of 
P(yt\wt)  in  Csid  mainly  depends  on  P(pi).  Then,  it  has  been 
shown  that  -  log  P(yt\wt)  <x  min  i(Ei-\- ft+gi)  =  dLD  (yt,  wt), 
which  shows  that  it  is  possible  to  continue  Viterbi  decoding 
appropriately  by  using  Levenshtein  distance  as  metric. 

V.  Conclusion 

In  this  paper,  we  have  shown  that  Levenshtein  distance  is 
suitable  for  metric  in  Csid .  This  indicates  that  it  is  possible 
to  continue  Viterbi  decoding  in  Csid  by  using  Levenshtein 
distance  as  metric. 
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Abstract  —  We  study  the  maximum  zero-run  length, 
Lmax ,  of  cosets  of  convolutional  codes,  and  show  that 
an  associated  block  subcode  to  a  large  extent  deter¬ 
mines  Lmax • 

I.  Introduction 

A  communication  system  or  storage  system  may  use  a  coset 
of  a  binary  convolutional  code  for  both  symbol  synchroniza¬ 
tion  and  error  control.  To  achieve  symbol  synchronization, 
the  coset  must  have  a  short  maximum  zero-run  length,  Lmax • 
The  shortest  values  of  Lmax  can  be  found  in  the  class  of  con¬ 
volutional  codes  of  rate  ( n  —  r)/n  for  which  at  least  one  row  of 
a  minimal  parity  check  matrix  is  nonpolynomial  [1].  We  focus 
on  this  class.  Each  convolutional  code  C  in  the  class  contains 
an  associated  block  subcode  Cb,  consisting  of  the  union  of  the 
sets  of  binary  labels  in  the  convolutional  code  trellis.  For  any 
binary  vector  p  of  length  n,  let  p  denote  the  sequence  obtained 
by  repeating  p  indefinitely.  We  consider  some  coset  C  +  p,  ob¬ 
tained  by  adding  some  vector  p  to  every  binary  label  in  the 
convolutional  code  trellis.  It  is  convenient  to  express  Lmax  as 
Lmax  =  max{LR,  PR},  where  the  label  run,  LR,  is  defined 
as  the  largest  number  of  intermediate  zeros  between  two  ones 
in  any  label  of  C  +  p  (of  Hamming  weight  at  least  two),  and 
the  path  run,  PR,  is  the  largest  number  of  consecutive  zeros 
in  any  sequence  C  +  p  consisting  of  two  or  more  consecutive 
coset  labels. 

II.  The  Connection  between  Zero-Run  Lengths 
of  Convolutional  Code  Cosets  and  Associated 
Block  Codes 

We  consider  an  (n,n  —  r)  convolutional  code  C  defined  by  a 
parity  check  matrix  H (D).  The  maximum  degree  of  the  i- th 
row  of  H(jD)  is  denoted  Vi .  Assume  that  the  first  tb  rows  of 
H(D)  are  nonpolynomial,  and  that  the  remaining  r  —  rs  rows 
are  sorted  according  to  increasing  row  degree.  That  is,  v\  = 
. . .  =  vTB  =  0  <  VrB+i  <  . . .  vr.  The  associated  block  subcode 
Cb  is  the  (n,  n  —  tb)  block  code  defined  by  the  submatrix  H s 
which  consists  of  the  first  tb  rows  of  H(Z>).  Let  PR  and 
LR  be  the  path  run  and  label  run,  respectively,  of  the  coset 
C-\-  p.  If  we  view  Cb  as  a  zero  constraint  length  (or  one  state) 
convolutional  code,  we  can  let  PRb  and  LRb  be  the  ’’path 
run”  and  label  run  of  Cb  +  p.  Then  we  can  show  the  following 
results. 

Lemma  1  PR  <  PRb-  Further,  PR  =  PRb  if  vrB+ i  >  2. 
Lemma  2  LR  =  LRb- 

1  This  work  was  supported  by  the  Norwegian  Research  Council 
(NFR)  under  contract  numbers  107542/410  and  107623/420. 


III.  Block  Code  Zero-Run  Lengths 
Lor  1  <  i  <  tb,  let  At  and  pi  be  the  first  and  last  position 
where  the  i-th  row  of  Hb  is  nonzero.  Assume,  without  loss 
of  generality,  that  the  rows  of  Hb  are  sorted  according  to 
Pi  <  pt+i,  1  <i  <n  . 

Lemma  3  For  the  coset  with  all-one  syndrome, 

LRb  <  maxi<i<rB+i{pt  -  maxo<j<i{Aj}}  -  1}, 

where,  for  convenience,  we  define  A0  =  1  and  prB  +  i  =  n. 

Lemma  4  For  the  coset  with  all-one  syndrome, 

PRb  <n-  maxi<i<rB{At}  +  pi  —  1}. 

IV.  Convolutional  Code  Zero-Run  Lengths 
Definition  1  Let  V  be  the  class  of  (n,n  —  2,  (0,z/)),  v  >  2, 
binary  convolutional  codes  with  the  first  row  of  the  parity 
check  matrices  equal  to 

t  zeros 

1...1  (T^"0 

where  t  is  the  number  of  trailing  zeros,  0  <  t  <  rc  —  2.  □ 

Theorem  1  LetC  be  a  convolutional  code  in  the  class  V.  Any 
coset  C  4-  p  for  which  the  first  syndrome  sequence  is  equal  to 
the  all-one  sequence  has  the  least  Lmax  =  2n  —  2  —  t  for  any 
period  n  coset  representative . 

Definition  2  For  r  >  3,  let  E  be  the  class  of  (n,  n  — 
r,  (0,  0,  ^3, ... ,  vr))  binary  convolutional  codes  for  which  the 
two  first  rows  (Hb)  of  the  parity  check  matrices  H(D)  are  of 
the  form 

ti  zeros 
- - ^ - s 

?  ...  ?  1  0  ...  0 

0...0  1  ?...?  1  0...0 
1 2  zeros  t2  zeros 

The  question  marks  denote  unknown  binary  digits,  t\  denotes 
the  number  of  trailing  zeros  in  the  first  row,  and  I2  and  %2 
denotes  the  number  of  leading  and  trailing  zeros  in  the  second 
row.  It  is  assumed  that  1  <^2+^2  <  n  —  2,  I2  >  <2  and 
t2  <  ti  <  n  —  2.  □ 

Theorem  2  LetC  be  a  convolutional  code  in  the  class  E .  Any 
coset  C-f  p  for  which  the  first  and  second  syndrome  sequences 
are  equal  to  the  all-one  sequence  has  Lmax  <  max{n  —  2  — 
%2,2n  —  2  —  I2  —  t\  } . 

We  also  show,  by  way  of  examples,  that  classes  V  and  E 
contain  codes  with  excellent  error-correcting  properties. 
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Abstract — A  new  modulation  coding  technique,  called  Integer 
Multiple  Mark  Modulation  (IMMM)  is  proposed.  IMMM  codes 
generate  asymmetrical  runlength  limited  sequences  with  spectral 
nulls  in  the  power  spectrum,  whose  positions  are  related  to  a 
specific  runlength  constraint.  This  coding  technique  can  be  used 
for  any  channel  requiring  specific  spectral  nulls  in  the  code 
spectrum,  such  as  partial-response  optical  recording. 

I.  Introduction 

Numerous  applications  of  digital  data  transmission  and 
storage  systems  require  the  use  of  runlength  limited  (RLL) 
codes  with  certain  defined  spectral  properties.  We  investigate 
a  method  to  furnish  codes  with  spectral  nulls  in  the  power 
spectrum  (except  DC)  of  the  encoded  sequence.  An 
application  of  such  codes  is  providing  a  gap  for  the  insertion 
of  auxiliary  pilot  tones,  used  for  positioning  the  servo  of 
magnetic  or  optical  disc  recorders  [1].  In  another  application, 
codes  with  spectral  nulls  in  the  power  spectrum,  which 
coincide  with  the  nulls  of  the  transfer  function  of  the  channel, 
are  used  in  partial -response  optical  recording  systems  [2]. 

II.  The  IMMM  Coding  Technique 

The  notation  for  asymmetrical  runlength  limited  (ARLL) 
sequences,  as  introduced  by  Karabed  and  Siegel  [3],  will  be 
used.  The  class  of  binary,  non-return-to-zero  (NRZ),  ARLL 
channels  can  be  defined  by  the  4-tuple  (d\  k! ,  e\  m'),  where 
dr  and  k'  are  the  minimum  and  maximum  runlength  of  0’s, 
respectively,  and  ef  and  rri  are  the  minimum  and  maximum 
runlength  of  l’s,  respectively.  The  Integer  Multiple  Mark 
Modulation  (IMMM)  coding  technique,  which  we  introduce, 
requires  the  runlengths  of  l’s  to  be  of  the  form  je\ 
1  <  j  <  m'/e'  (i.e.  integer  multiples  of  e).  The  l’s  are  usually 
referred  to  as  written  marks  in  optical  storage  thus  the  name 
"Integer  Multiple  Mark  Modulation". 

Even  Mark  Modulation  (EMM)  was  introduced  by  Karabed 
and  Siegel  [3]  to  improve  the  performance  of  input-restricted 
partial-response  optical  recording  channels.  EMM  satisfies  the 
runlength  constraint  (d\  k\  e\  m)  =  (1,  2,  and  the 

requirement  that  the  written  marks  are  of  even  length  [3].  The 
EMM  coding  technique  is  therefore  a  special  case  of  the 
Integer  Multiple  Mark  Modulation  technique. 

The  IMMM  coding  technique  has  the  interesting  property 
that  it  has  spectral  nulls  at  rational  submultiples  of  the 
symbol  frequency,  the  position  of  which  can  be  chosen 
merely  by  adjusting  the  minimum  runlength  of  l’s: 
Proposition  1 

An  IMMM  (d\  k\  e\  m)  sequence,  with  k'  >  d'  >  1,  and 
rri  >  e  >  1,  will  contain  spectral  nulls  at  the  frequencies 
/=  rfje,  with  re  {1,  2,  3,  ...,  e'-\)  and  where  fs  is  the 
symbol  frequency.  m 


To  calculate  the  channel  capacity  for  these  sequences,  the 
following  proposition  can  be  used: 

Proposition  2 

The  noiseless  channel  capacity  for  a  binary  IMMM 
( d\  k\  e,  m')  NRZ  input-restricted  channel  with  k'  >  dr  >  1 
and  m  >  e'  >  1,  is  given  by  H  =  \og2X  where  X  is  the 
largest  real  root  of  the  characteristic  equation: 

jye'+k'+m'+X  _  jy  e'+k'+m'  _  jy  k'+m+ 1  +  jy  k'+m  _ 

D  +  D  k''d'+}  +  Dm'~  1  =0. 


The  generating  function  is  a  rational  function  that  can  be 
expanded  into  a  power  series  such  that  the  coefficient  of  each 
dummy  variable  equals  the  number  of  possible  unique 
constrained  sequences  where  the  length  of  the  sequences  is 
given  by  the  power  of  the  dummy  variable.  We  present  the 
generating  function  for  the  number  of  IMMM  (d\  k',  e  ,  m') 
sequences  of  length  i3,  for  which  an  arbitrary  concatenation 
of  code  words  also  satisfies  the  channel  constraints. 
Proposition  3 

The  generating  function  for  the  number  of  self- 
concatenable  code  words  for  IMMM  ( d\  k\  e  ,  ra')  sequences 
of  length  i3,  where  $  >  d'+e,  is  given  by: 

X)  [(l-x)(l-xe)  ~(xd  -xk  ±])(xe' -xm  +e)](l-x) 


if  k'-d'  > 

n*)=- 


_,  and 


(xd'-xk'+l)(l-xje'+e)(xe'-xm'-je'+e) 


[(l-x)(l-xe)-(xd  -xk+l)(xe  -xm+em-xe) 


,,  m'-e'  , 

if  k  -d  <  - ,  and 


where  i 


e 

k!-dr 


and  j  = 


m-  e 


2  ef 


The  above  proposition  can  be  used  to  determine  the 
number  of  code  words  when  developing  IMMM  block  codes. 
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I.  Introduction 

Recently,  a  new  class  of  unequal  error  protection  codesfl] 
which  protects  the  fixed-byte  in  computer  words  from  er¬ 
rors  has  been  proposed[2].  Here,  the  fixed-byte,  which 
stores  valuable  and  important  information  such  as  ad¬ 
dress  in  communication  messages  and  pointer  in  database 
words,  means  the  clustered  information  having  6-bit 
length  whose  position  in  the  word  is  determined  in  ad¬ 
vance. 

This  paper  proposes  an  extended  class  of  optimal  fixed- 
byte  error  protection  codes  which  protects  the  fixed-byte 
from  single-bit  errors  outside  the  fixed-byte  as  well  as 
any  errors  within  the  fixed-byte,  occurred  simultaneously. 
This  class  of  codes  is  called  Single-bit  plus  Fixed,  b-bit  byte 
Error  Correcting  codes,  i.e.,  (S+F6)EC  codes. 

II.  Preliminaries 

Theorem  1  A  binary  linear  code,  described  by  the  par¬ 
ity  check  matrix  H,  corrects  all  single-bit  plus  fixed-byte 
errors ,  if  and  only  if 

(a)  e-HT  jL  0  for  Ve  €  {E1  U  E2} 

(bj  ei  •  H  ^  ej  ■  H  for  Ve,,Vej  €  E\,ei  ^  ej 

(c)  ep  ■  H  ^  eq  •  Ht  for  Vep,  Ve?  €  E2.  ep  eq 

(d)  ei  ■  Ht  ^  ep  ■  Ht  for  Ve*  €  E\  and  Vep  e  E2 

(e)  ( ei  +  ep)  '  HT  /  (e^-  +  eq)  ■  Ht  for  Vei.Vej  e  Ex 
and  'iep^eq  6  E2.  ei  ^  ej,ep  ^  eq, 

where  HT  is  the  transpose  of  H.  Ex  is  the  error  set 
caused  by  single-bit  errors  outside  the  fixed-byte  in  the 
word,  and  E2  the  error  set  caused  by  all  possible  errors 
in  the  fixed-byte. 

Theorem  2  The  maximal  code  length  of  an  ( N ,  N  -  r) 

( S +Fb )EC  code  is  shown  as 


r  =  b  +  x 

xivxo  liiiuilllcUill 

b  - f-  3  6  +  4 

JU-Ull  1CI 

6  +  5 

igtn  oi  { : 
6  +  6 

3-1 -r  0JEA 
~T+T~ 

codes 

6  +  8 

Kmax 

4  11 

26 

"  57 

120 

247 

The  following  H  matrix  shows  the 
(S+Fb)EC  code  satisfying  the  bounds  on  code 
shown  in  Theorem  2: 


length 


H  = 


HF\Hq\  Ir 

h  I  O 
P  I  Q 


=  [  P  I  <3  I  /r] 


where  HF  = 


h 

P 


Hn  = 


O 

Q 


Ib(Ir)  *  b  x  b  (r  x  r)  identity  matrix  O  :  zero  matrix 
P  :  (r  —  b)  x  b  matrix  whose  b  distinct  binary  colum.n 
vectors  have  weight  larger  than  or  equal  to  two. 

Q  :  matrix  having  all  possible  nonzero  ( r  -  b)-bit  binary 
columns  excluding  those  in  P  and  weight  one  columns. 

Let  the  upper  b  bits  of  the  syndrome  5  be  SF,  and  the 
matrix  GF  be  defined  as  GF  —  [  P  \  Ir_b  ] . 

With  using  SF  and  GF,  decoding  of  the  (S+F6)EC 
code  is  performed  according  to  Table  2. 

IV.  Conclusion 

This  paper  has  proposed  an  extended  class  of  optimal 
fixed-byte  error  protection  codes,  and  has  demonstrated 
the  bounds  on  code  length  and  the  code  construction 
method.  ^ 
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Table  2:  Decoding  of  (S+F6)EC  codes 


Nmax  —  27  +  b  —  1. 

Thus,  the  maximum  information-bit 
length  Krnax  can  be  expressed  as 
Krnax  —  2x—x  —  l  where  x  =  r  —  b. Table 
1  lists  Kjnax  for  check- bit  length  r  = 
b  +  x.  6 

III.  Code  Construction 

Without  loss  of  generality,  the  fixed- 
byte  is  assumed  to  be  located  at  the  be¬ 
ginning  of  the  word  and  the  check-bits 
be  located  at  the  end  of  the  word.  Here, 
the  H  matrix  of  the  code  is  divided 
into  three  submatrices  shown  in  (3-1). 
The  submatrix  H F  shows  the  one  cor¬ 
responding  to  the  fixed-byte  having  6- 
bit  length,  the  submatrix  Ir  the  one 
corresponding  to  the  check-bits  having 
r-bit  length,  and  the  intermediate  sub¬ 
matrix  Hq  the  remaining  one  having 
(N  —  b  —  r)- bit  length. 

H  -  [HP\Ho\Ir]  (3-1) 


03 

II 

O 

1  error  free 

S  =  one  column 
vector  in  H 

corresponding  single-bit  error  correction 

o 

II 

O 

03 

fixed-byte  error  correction 
[  byte  error  pattern :  S>] 

Sj  0 

S  4  one  column 
vectoring 

S-G\4  0 

S  *  Gtf:  corre¬ 
sponds  to  one  col¬ 
umn  vector  in  Q 
or  Jr_fc  in  IT 

corresponding  one-bit  error  cor¬ 
rection  and  fixed-byte  error  cor¬ 
rection 

[byte  error  pattern;  Sp] 

S  •  Gf:  corre¬ 
sponds  to  one  col¬ 
umn  vector  in  P 
,  e.g.,  /-th  column 
vector 

/-th  (1  <  /  <  b)  check-bit  er¬ 
ror  correction  and  fixed-byte  er¬ 
ror  correction 
byte  error  pattern: 

/-i 

5Jp*5f  +  C 00^010-*0) 
i 
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Abstract  —  An  analysis  of  the  Generalized  Cross 
Constellation  (GCC)  is  presented  and  a  new  perspec¬ 
tive  on  its  coding  algorithm  is  described.  We  show 
how  the  GCC  can  be  used  to  address  generic  sets  of 
symbol  points  in  any  multidimensional  space  through 
an  example  based  on  the  matched  spectral  null  cod¬ 
ing  used  in  magnetic  recording  devices.  We  also  prove 
that  there  is  a  forbidden  rate  region  of  fractional  cod¬ 
ing  rates  that  are  practically  unrealizable  using  the 
GCC  construction.  We  introduce  the  idea  of  a  con¬ 
stellation  tree  and  show  how  its  decomposition  can  be 
used  to  design  GCC’s  matching  desired  parameters. 
Following  this  analysis,  an  algorithm  to  design  the  op¬ 
timal  rate  GCC  from  a  restriction  on  the  maximum 
size  of  its  constellation  signal  set  is  given,  and  a  for¬ 
mula  for  determining  the  size  of  the  GCC  achieving  a 
desired  coding  rate  is  derived.  We  finish  with  an  up¬ 
per  bound  on  the  size  of  the  constellation  expansion 
ratio. 


I.  Introduction 

The  27V— dimensional  generalized  cross  constellation  (GCC) 
selects  a  block  of  TV  2-dimensional  points  from  among  a  family 
of  simply-defined  constituent  subconstellations  by  first  choos¬ 
ing  a  constrained  sequence  of  these  subconstellations  and  sec¬ 
ond  selecting  an  individual  channel  symbol  from  each  chosen 
subconstellation.  This  construction  reduces  the  multidimen¬ 
sional  addressing  problem  to  a  series  of  TV  2— dimensional  sub¬ 
constellation  mappings  [1].  Furthermore,  since  the  sequence 
constraints  select  the  distinct  sub  constellations  with  different 
probabilities,  the  GCC  also  makes  it  possible  to  reduce  aver¬ 
age  transmitted  signal  power.  While  this  addressing  technique 
can  be  applied  to  channel  constellations  of  any  type,  gener¬ 
alized  cross  constellations  have  hitherto  found  application  in 
QAM  modems. 

Since  it  is  possible  to  encode  a  fractional  average  bitrate 
into  each  channel  symbol  and  also  possible  to  use  a  constel¬ 
lation  whose  total  cardinality  is  not  restricted  to  an  integer 
power  of  2,  generalized  cross  constellations  are  also  powerful 
tools  for  maximizing  the  encoding  rate  of  discrete  communi¬ 
cations  channels.  These  qualities  are  especially  attractive  in 
coding  for  channels,  other  than  QAM  modems,  for  which  there 
is  a  predetermined  symbol  alphabet  of  size  271  <  TV  <  2n+1 . 

An  example  of  such  a  discrete  communications  channel  is 
the  Partial  Response  Maximum  Likelihood  (PRML)  magnetic 
recording  channel.  An  elementary  approach  to  coding  for 
the  PRML  channel  involves  the  construction  of  DC-free  block 
codes  which  maintain  all  of  the  advantages  of  the  MSN  trellis 
codes  without  the  problems  of  error  propagation  and  decoder 
complexity  [2].  A  DC-free  block  code  is  a  set  of  balanced  bi¬ 
nary  TV— tuples,  each  of  which  has  an  equal  number  of  zeros 
and  ones.  Therefore,  the  codewords  making  up  a  DC-free  code 


must  be  selected  from  among  the 


(  N/  2  ) 


balanced  binary 


TV— tuples  and  should  be  chosen  to  allow  a  simple  address¬ 
ing  scheme  from  binary  user  data.  Difficulties  occur  because 
TV 


N/2 


)■ 


is  never  an  integral  power  of  2.  Since  a  simple 


look-up-table  addressing  scheme  only  works  for  constellations 
of  size  2k  for  k  E  Z+,  a  DC-free  block  code  must  discard 
TV 

N  1  possible  code- 


N 

N/2 


-2 


log2 


N/2 


-i  of  the 


words  and  encode  only 


log2 


TV 

N/2 


N/2 

user  bits  per  channel 


symbol. 

For  example,  consider  the  case  TV  =  10.  Since 

252,  the  optimal  addressing  method  would  encode  an  average 
of  log2(252)  —  7.98  bits  per  channel  symbol.  Unfortunately, 
due  to  the  integral  power-of-two  restriction,  a  simple  look¬ 
up-table  addressing  scheme  only  permits  a  codebook  of  size 
2  Llog2  (252)J  —  128  codewords  and  can  therefore  encode  only  7 
bits  per  symbol.  Using  a  GCC,  however,  a  codebook  using 
240  of  the  available  252  DC-free  binary  10— tuples  is  possible, 
and  a  rate  of  7|  bits  per  channel  symbol  can  be  achieved. 


II.  Theoretical  Results 

Theorem  .1  Given  a  generalized  cross  constellation ,  C/3, 
with  average  encoding  rate  f3  =  n  bits  per  symbol,  the 

total  cardinality  of  Cp  is 

i^i =2n-  n  iv  a) 

P'€P' 

where  Rp  =  ,  P'  =  {m—p\p  E  P},  and  the  set  P  is  defined 

as  the  ordered  set  of  indices,  i,  in  the  binary  decomposition  of 

d.  □ 


Theorem  .2  There  exists  a  region  of  values  for  parameters 
d,mandn,  d  <  2m ,  for  which  the  associated  generalized  cross 
constellation  requires  more  than  2n+1  channel  symbols.  □ 
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Abstract  —  New  linear  symbol-spreading  strategies 
for  efficient  single-  and  multi-user  communication  in 
environments  subject  to  fading  due  to  time-varying 
multipath  are  introduced.  For  given  power,  band¬ 
width,  and  delay  constraints,  these  new  systems  sig¬ 
nificantly  reduce  the  computation  required  to  achieve 
a  prescribed  level  of  performance.  Several  aspects  of 
these  systems  and  their  performance  will  be  devel¬ 
oped. 

I.  Spread-Response  Precoding 

For  single-user  or  frequency-division  multiplexed  wireless 
systems,  we  first  develop  a  technique  we  refer  to  as  “spread- 
response  precoding,”  which  replaces  the  interleaving  typically 
used  in  conjunction  with  coding  in  such  systems.  In  tradi¬ 
tional  bandwidth-limited  systems  for  communication  over  fad¬ 
ing  channels,  coding  is  used  to  combat  the  effects  of  both  ad¬ 
ditive  receiver  noise  and  fading.  Furthermore,  achieving  high 
performance  generally  requires  the  use  of  codes  with  a  large 
number  of  states.  However,  the  computational  requirements 
inherent  in  the  use  of  such  large  codes  typically  preclude  their 
use  in  practice.  With  the  new  systems  described  in  this  pa¬ 
per,  much  of  the  burden  of  combatting  fading  is  shifted  to  the 
spread-response  precoder,  allowing  shorter  codes  to  be  used 
for  a  given  level  of  performance.  Since  this  precoding  (and 
postcoding)  is  implemented  using  linear  filtering,  the  net  re¬ 
sult  is  a  significant  reduction  in  computational  complexity  in 
the  system. 

The  precoder  is  implemented  using  either  linear  time- 
invariant  or  periodically  time-varying  filters.  The  key  char¬ 
acteristics  of  the  precoding  filters  is  that  they  are  orthonor¬ 
mal  or  near-orthonormal  transformations  of  the  input  sym¬ 
bols,  and  that  their  impulse  response  energy  is  widely  spread 
in  time.  This  spreading  allows  each  coded  symbol  to  see,  in 
an  appropriate  sense,  the  average  characteristics  of  the  chan¬ 
nel.  In  fact,  from  the  perspective  of  the  coded  symbol  stream, 
spread-response  precoding  asymptotically  transforms  an  arbi¬ 
trary  Rayleigh  fading  channel  into  a  nonfading,  simple  white 
marginally  Gaussian  noise  channel  in  which  intersymbol  inter¬ 
ference  is  transformed  into  a  comparatively  more  benign  form 
of  additive  white  noise  that  is  uncorrelated  with  the  input. 

II.  Spread-Signature  CDMA 

In  the  multiuser  case,  spread-response  precoding  general¬ 
izes  to  a  new  class  of  orthogonal  code-division  multiple-access 
(CDMA)  systems  for  efficient  communication  in  environments 


subject  to  multipath  fading  phenomena.  The  key  charac¬ 
teristic  of  these  new  systems,  which  we  refer  to  as  “spread- 
signature  CDMA”  systems,  is  that  the  associated  signature 
sequences  are  significantly  longer  than  the  interval  between 
symbols.  Using  this  approach,  precoding  is  embedded  into 
the  signature  sequences  in  the  system,  so  that  the  transmis¬ 
sion  of  each  symbol  of  each  user  is,  in  effect,  spread  over  a  wide 
temporal  and  spectral  extent,  which  is  efficiently  exploited  to 
combat  the  effects  of  fading. 

Analogous  to  the  single-user  case,  spread-signature  CDMA 
systems  asymptotically  transform  the  multiuser  Rayleigh  fad¬ 
ing  channel  into  a  collection  of  decoupled  quasi-Gaussian 
channels.  Optimizing  the  signal-to-noise  ratio  in  the  result¬ 
ing  quasi-Gaussian  channel  with  respect  to  the  choice  of  a 
linear  equalizer  leads  to  mininimum  mean-square  error  type 
equalizers. 

An  optimum  class  of  spread-signature  sets  for  this  appli¬ 
cation  is  developed  out  of  multirate  system  theory,  and  effi¬ 
cient  implementations  are  described.  Estimates  of  the  capac¬ 
ity  and  uncoded  bit-error  rate  characteristics  are  computed 
with  these  optimized  systems  and  compared  with  those  of 
more  traditional  CDMA  systems.  The  performance  advan¬ 
tages  appear  substantial  for  practical  systems.  Furthermore, 
the  use  of  these  new  systems  requires  no  additional  power  or 
bandwidth,  and  is  attractive  in  terms  of  computational  com¬ 
plexity,  robustness,  and  delay  considerations.  Some  remaining 
challenges  inherent  in  their  use — including  managing  peak-to- 
average  power  requirements  and  developing  suitable  timing 
recovery  strategies — are  also  described. 

A  detailed  development  of  these  results  is  presented  in  [1] 

[2]. 

Acknowledgements 

The  author  is  grateful  to  N.  S.  Jayant,  C-E.  Sundberg,  N. 
Seshadri,  J.  Kovacevic,  M.  Sondhi,  A.  Odlyzko,  E.  Teletar,  A. 
Wyner,  and  S.  Shamai  (Shitz),  all  at  AT&T  Bell  Laborato¬ 
ries,  for  many  helpful  discussions,  comments  and  suggestions 
regarding  this  work. 

References 

[1]  G.  W.  Wornell,  “Spread-Response  Precoding  for  Communica¬ 
tion  over  Fading  Channels,”  submitted  to  IEEE  Trans.  Inform. 
Theory ,  Apr.  1994. 

[2]  G.  W.  Wornell,  “Spread-Signature  CDMA:  Efficient  Multiuser 
Communication  in  the  Presence  of  Fading,”  to  appear  in  IEEE 
Trans.  Inform.  Theory. 


LThis  work  has  been  supported  by  AT&T  Bell  Laboratories, 
where  the  author  was  on  leave  during  the  1992-93  academic  year, 
and  in  part  by  the  Advanced  Research  Projects  Agency  monitored 
by  ONR  under  Contract  No.  N00014-93-1-0686,  the  National  Sci¬ 
ence  Foundation  under  Grant  No.  MIP-9502885,  and  the  Office  of 
Naval  Research  under  Grant  No.  N0014-95-1-0834. 


150 


Information  Theoretic  Limits  on  Communication  Over  Multipath 

Fading  Channels 

Richard  Buz1 

Communications  Research  Centre,  3701  Carling  Ave.,  P.O.  Box  11490,  Station  H 
Ottawa,  Ontario,  Canada,  K2H  8S2 


Abstract  —  Limits  on  the  rate  of  reliable  communi¬ 
cation  over  multipath  fading  channels  are  presented. 
An  idealized  channel  model  is  considered  first  in  order 
to  determine  the  loss  due  to  amplitude  fading.  The 
requirement  of  channel  estimation  is  demonstrated 
through  calculation  of  limits  for  channels  in  which  the 
state  of  the  fading  process  is  not  completely  known. 
Loss  incurred  due  to  the  limitation  of  practical  chan¬ 
nel  estimation  schemes  is  determined.  The  particular 
methods  of  channel  estimation  considered  are  pilot 
tone  extraction,  differentially  coherent  detection,  and 
the  use  of  a  pilot  symbol. 


where  the  Rician  channel  parameter  7 r  is  the  ratio  of  power  in 
the  LOS  component  to  that  in  the  scattered  component,  and 
A j  =  \ilEs  —  EP(X)  {ln|x|2}  is  a  positive  number  obtained 
from  Jensen’s  inequality.  As  SNR  — >  00,  Iu  approaches  a  con¬ 
stant  value  of  A  j  log  e  4  log  (1  4-  7  r)-  For  a  Rayleigh  channel 
and  a  Gaussian  distributed  input,  the  AMI  is  bounded  to  less 
than  0.83  bits/T. 

If  a  receiver  can  track  variations  in  the  phase  of  the  fading 
process,  then  it  is  reasonable  to  model  the  system  as  having 
ideal  fading  phase  information  but  no  fading  amplitude  infor¬ 
mation.  In  this  case,  entropy  power  relations  yield  an  upper 
bound  on  AMI  for  a  Rayleigh  channel  of  the  form 


I.  Ideal  Fading  Channels 

The  capacity  of  a  discrete-time  Rayleigh  fading  channel 
has  been  considered  by  Ericson  [1].  His  result  is  based  on 
the  idealistic  assumption  that  the  value  of  the  fading  process 
is  known  at  the  receiver  and  is  independent  with  respect  to 
discrete-time  symbol  intervals.  For  an  ideal  Nakagami  fading 
channel  and  integer  values  of  the  Nakagami  parameter  m,  the 
capacity  is 


C  =  (log2  e) 


(— ro)m 

r(m) 


No 


bits/T 


where  s  =  \  P(-)  is  the  gamma  function,  Ei(-)  is  the 

exponential  integral  function,  and  T  is  the  discrete-time  sig¬ 
naling  interval.  When  compared  to  the  capacity  of  an  additive 
white  Gaussian  noise  channel,  the  maximum  loss  in  average 
SNR  due  to  Nakagami  fading  is  where  ?/>(•)  is  Euler’s 

psi  function.  This  expression  of  loss  is  valid  for  any  m  >  0. 
The  capacity  of  a  Nakagami  fading  channel  also  represents  the 
capacity  of  a  Rayleigh  fading  channel  when  space  diversity 
combining  is  used.  In  this  case,  the  Nakagami  channel  pa¬ 
rameter  m  corresponds  to  the  number  of  antennae  used  in  the 
system.  In  terms  of  channel  capacity,  the  gain  in  SNR  achiev¬ 
able  through  the  use  of  antenna  diversity  is  eCE  — 
where  is  Euler’s  constant. 


II.  Incomplete  CSI 

In  a  Rician  fading  environment  with  no  CSI,  a  line-of-sight 
(LOS)  component  exists  which  is  normally  strong  enough  to 
support  the  transmission  of  information.  In  this  case,  the  scat¬ 
tered  component  is  sometimes  viewed  as  an  additional  source 
of  interference,  although  it  does  convey  a  small  amount  of 
information.  By  using  entropy  power  relations  [2],  one  may 
determine  an  upper  bound  to  the  average  mutual  information 
(AMI)  of  the  form 


Iu  =  log 


1  4-  — 

1  ^  Np 


1  _l  exp(-Aj)  E3 
1  1+7R  N0 


1This  work  was  performed  as  part  of  a  Ph.D.  thesis  at  Queen’s 
University  with  support  provided  by  TRIO  and  NSERC. 


Iu  =  log 


1  .  exp(-Aj)  Es 

1  ^  2tt  N 0  J 


which  approaches  a  value  of  Aj  loge  4*  log27r  as  SNR  — >  00. 
With  ideal  fading  phase  information  and  sufficient  SNR,  data 
transmitted  via  the  symbol  phase  can  be  accomplished  with  an 
arbitrarily  small  probability  of  error.  In  addition,  a  discrete¬ 
valued  constellation  can  be  used  to  achieve  a  higher  data  rate 
than  a  continuous- valued  input. 


III.  Use  of  Channel  Estimation 
When  CSI  is  obtained  by  means  of  practical  estimation 
methods,  the  AMI  conditioned  on  knowledge  of  the  channel 
estimate  is  a  function  of  both  SNR  and  the  Doppler  frequency 
fD  of  the  fading  process.  When  considering  practical  signal 
constellations  for  coding  (i.e.  at  rates  of  less  than  log2  M 
bits/T  with  an  M-point  constellation),  additional  losses  are 
incurred  due  to  the  limitations  of  these  estimation  schemes. 
For  a  Rayleigh  fading  channel  with  a  normalized  Doppler  fre¬ 
quency  of  /dT  =  0.1,  systems  which  use  pilot  tone  estimation 
incur  a  loss  of  1.0- 1.5  dB.  Under  the  same  conditions,  the  loss 
experienced  through  the  use  of  differentially  coherent  detec¬ 
tion  is  roughly  in  the  range  of  3-4  dB.  Systems  based  on  pilot 
symbol  transmission  exhibit  losses  in  the  range  of  4. 5-8. 5  dB. 
When  using  differential  detection  or  pilot  symbol  transmis¬ 
sion,  the  equivocation  of  the  channel  cannot  be  made  arbi¬ 
trarily  small.  The  magnitude  of  this  remaining  uncertainty  is 
strongly  affected  by  the  value  of  /pT. 
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Abstract  —  We  derive  a  compact  formulation  of  the 
computational  cutoff  rate  of  binary  differential  phase 
shift  keying  (BDPSK)  over  a  correlated  Rayleigh  fad¬ 
ing  channel.  The  analysis  is  more  realistic  than  pre¬ 
vious  finite  state  Markov  models  of  a  fading  channel. 


I.  Introduction 

Compared  to  memoryless  channel  models,  there  are  few 
capacity  and  cutoff  rate  results  for  channels  with  memory, 
and  most  such  models  are  not  very  realistic  [1].  We  present 
an  exact  analysis  of  non-interleaved  binary  differential  phase 
shift  keyed  signaling  over  a  correlated  Rayleigh  fading  channel. 
The  lack  of  interleaving  forces  the  analysis  to  deal  with  the 
channel  memory  directly.  We  find  that  modeling  the  received 
channel  process  as  a  finite  order  Markov  process  allows  the 
sum-over-codewords  portion  of  the  computational  cutoff  rate 
calculation  to  be  performed  combinatorially.  This  provides  for 
a  succinct  formulation  of  the  cutoff  rate,  Ro¬ 


ll.  Results 

On  a  Rayleigh  fading  channel,  the  probability  density  func¬ 
tion  of  received  signal,  y,  conditioned  on  the  N  symbol  trans¬ 
mitted  vector  x,  can  be  written 


pjv(y|x) 


1  -y^XR-'X^y 

irNn  |iZ| 6 


where  we  assume  n  samples  per  channel  symbol.  The  total 
channel  correlation  matrix,  R  =  Rf  +  <r2 1,  is  the  sum  of  the 
fading  correlation  matrix,  R/,  and  that  of  the  additive  white 
noise  of  variance  a2 .  The  diagonal  matrix  X  takes  the  vector 
x  along  its  diagonal,  i.e.,  X  =  diag(x). 

The  code  ensemble  average  probability  of  error  can  be 
bounded  by  the  combined  Union-Bhattacharyya  bound  [2]. 
Using  the  above  density,  and  simplifying  for  the  case  of 
BDPSK  signaling,  we  have  the  code  ensemble  bound 


•S-  M  -  1  v  2"” 

*-  2N  \R-'  +  XR-'X1\’ 


where  M  is  the  number  of  codewords  in  the  code  and  we 
sum  over  all  binary  sequences  X.  If  we  assume  that  the  re¬ 
ceived  channel  process  is  auto-regressive  of  order  L,  then  the 
inverse  channel  matrix,  R-1,  will  be  Toeplitz  banded  diago¬ 
nal  in  form,  except  for  L  X  L  sample  blocks  at  each  end  of  the 
diagonal  [3]. 

The  trick  here  is  recognizing  that  off-diagonal  entries  of 
the  denominator’s  matrix  will  be  either  zero  or  a  constant 
non-zero  value,  depending  on  whether  the  phase  shift  between 
symbols  in  X  is  zero  or  tt  phase  shift  respectively.  This,  pro¬ 
vided  we  are  sampling  at  at  least  L  samples  per  symbol.  The 
zero  off-diagonal  entries  will  then  pinch  off  the  matrix  into  a 


block  diagonal  form.  For  example, 

rBBn 

I 

R”1  ilR'1!1  - 


□  □ 
□  □ 


□  □  „ 
□  □J 


Summing  over  all  binary  sequences  of  X  in  the  bound  is 
then  equivalent  to  summing  over  all  possible  block  partitions 
of  the  matrix.  This  then  corresponds  to  summing  over  all 
integer  partitions  of  N.  If  we  define  D(m)  as  the  determi¬ 
nant  of  the  m  x  m  symbol  band  Toeplitz  block  of  R-1,  the 
combinatorics  of  the  partitioning  allows  us  to  write 


_  2  1 

P*<^Zk!  Bn.x 


1!  2! 


(N-k+1)! 

D(l)  ’  D(2)  ’  *  *  *  ’  D(N  —  k  +  1) 


where  Bn,)c(’)  is  the  ( N ,  k)  Bell  polynomial  [4].  We  define  the 
m  x  m  symbol  matrix  R(m)  such  that  its  inverse,  R_1(m), 
equals  the  m  symbol  inverse  channel  correlation  matrix,  R~x, 
however,  we  extend  the  inner  Toeplitz  portion  of  the  matrix 
into  the  L  x  L  sample  blocks  at  either  end  of  the  diagonal, 
overwriting  them.  Thus,  |R(m)|  =  1  j  D(m). 

Using  a  relation  between  Bell  polynomials  and  the  compo¬ 
sition  of  Taylor  series  [4],  and  between  the  exponential  growth 
of  the  coefficients  of  a  Taylor’s  series  and  its  radius  of  conver¬ 
gence  [5],  we  can  then  formulate  the  computational  cutoff  rate 
as  follows. 

Theorem  1  Define  the  generating  function  for  the  determi¬ 
nants  of  the  m  symbol  Toeplitz  extended  channel  correlation 
matrices  as 

oo 

g{t)  =  ^|ft(m)|tro 

m=l 

=  |ft(l)|i  +  |ft(2)|i2  +  |ft(3)|i3  +  --- 
The  computational  cutoff  rate  is  then  given  by 
Ro  =  log2(2|*0|)  [bits/symbol] , 
where  is  the  smallest  magnitude  singularity  of  the  function 

1 

1  -  g(*)  ' 
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This  paper  focuses  on  the  construction  of  M-PSK  block 
modulation  codes  for  the  Rayleigh  fading  channel.  We  present 
some  new  codes  constructed  by  two  different  methods.  The 
first  method,  an  exhaustive  computer  search,  is  appropriate 
for  short  block  lengths.  Some  optimum  codes  are  found.  The 
second  method  is  for  multilevel  block  codes.  In  this  case  we 
use  the  cutoff  rate  performance  criterion  for  multistage  decod¬ 
ing.  Simulation  results  are  presented.  From  them  we  conclude 
that  the  second  method  can  propose  codes  which  achieve  im¬ 
provement  over  known  codes  for  low  and  moderate  SNR’s. 

Codes  found  by  a  computer  search 

The  block  codes,  so  far  supplied  by  the  literature  [1,  2], 
were  constructed  using  the  multilevel  coding  technique.  Good 
codes  can  be  constructed  by  the  multilevel  technique  (with  the 
intrinsic  advantage  of  a  multistage  decoder),  but  this  tech¬ 
nique  does  not  always  lead  to  optimum  codes.  If  the  block 
length  of  the  code  n  is  small  and  M  —  4,  8, 16,  it  is  possible  to 
generate  all  Mn  sequences  of  M-PSK  symbols  and  store  the 
subset  of  sequences  with  the  greatest  number  of  elements  that 
satisfies  a  specified  design  criterion. 

Our  aim  was  to  construct  codes  based  on  the  following 
design  criterion:  “For  a  given  code  length  n  and  a  given  value 
of  the  desired  minimum  Hamming  distance  du,  find  the  code 
with  the  greatest  rate  R  (bits/symbol)  such  that  the  minimum 
product  distance  dp  is  not  less  than  7.” 

By  a  computer  search,  some  new  codes  with  lengths  n  = 
4,  5,  6,  7  (number  of  M-PSK  symbols)  and  different  minimum 
Hamming  distances  were  found  for  4-PSK  and  8-PSK  modu¬ 
lation.  We  will  show  simulation  results  for  a  4-PSK  code  with 
n  =  6 ,R  =  1  and  dn  =  4  that  cannot  be  constructed  as  a 
multilevel  code.  This  code  has  a  coding  gain  of  about  14.0  dB 
over  the  uncoded  2-PSK  system,  at  the  bit  error  probability 

(Pt>)io-3. 

Multilevel  block  codes 

We  consider  multilevel  block  codes  constructed  based  on 
a  sequence  of  binary  partitions  of  the  2m  — PSK  modulation. 
The  m-level  code  consists  of  the  binary  component  codes 
Bq,  Bi, . . . ,  Bm-\  with  rates  R(0),  R(l), . .  . ,  R(m  —  1),  re¬ 
spectively.  The  method  for  constructing  multilevel  codes  we 
propose  deals  with  the  following  question:  For  a  given  rate 
R  —  R(0)-\-R(l)+-  *  1)  of  the  m-level  code  and  a  given 

SNR  of  the  channel,  how  can  the  rates  R(j),  0  <  j  <  m  —  1, 
be  chosen  in  such  a  way  that  the  word  error  probability 
(Pe)  for  multistage  decoding  of  the  m-level  code  is  mini¬ 
mized?  This  question  is  answered  in  [3]  based  on  the  cutoff 
rate  performance  criterion.  This  criterion  leads  to  the  rates 
R(j)y  0  <  j  <  m  —  1,  that  minimize  an  upper  bound  on  Pe  of 
multistage  decoding. 

Assuming  a  Rayleigh  fading  channel  with  ideal  interleaving, 
ideal  coherent  detection  and  perfect  channel  state  information, 
we  have  obtained  the  optimum  rates  for  the  component  codes 
of  4-PSK,  8-PSK  and  16-PSK  block  modulation  codes  for  dif¬ 
ferent  values  of  SNR’s. 


Knowing  the  optimum  rates  for  a  given  SNR  we  can  con¬ 
struct  multilevel  block  codes  with  the  help  of  Verhoeff ’s  table. 
For  example,  the  optimum  rates  Rop(j),  j  =  0,1,2,  for  a  fixed 
rate  R  =  2.0  bits/symbol  of  the  8-PSK  block  code  and  a 
SNR  of  15.0  dB  are:  £op(0)  =  0.4362,  Rop{  1)  =  0.7349  and 
Rop( 2)  =  0.8289.  If  we  choose  n  =  16,  we  can  approximate  the 
optimum  rates  with  the  codes:  Bo  =  (16,  7,  6),  B\  =  (16, 12,  2) 
and  B2  =  (16, 13,  2).  If  we  use  B0  =  (16,  7,  6),  Bx  =  (16, 11,  4) 
and  B2  =  (16, 15,  2),  we  get  a  code  with  R  —  2.06. 

Figure  1  shows  the  behaviour  of  Pb  for  two  different  8- 
PSK  block  codes.  Code  X  is  the  above  mentioned  code  with 
R  =  2.06.  Code  Y  is  a  code  of  same  rate  R  and  minimun 
Hamming  distance  4  constructed  with  component  codes  Bo  = 
Bi  =  B2  =  (16,  11,4).  Code  X  shows  better  performance  than 
code  Y  for  low  and  moderate  SNR’s.  For  Pb  =  10-3,  code  X 
has  a  coding  gain  of  about  1.0  dB  over  code  Y.  For  SNR’s 
higher  than  17.0  dB,  the  performance  of  code  Y  is  superior. 
This  behaviour  can  be  explained  due  to  the  fact  that  it  has 
a  larger  value  of  dn,  which  is  the  dominanting  performance 
parameter  for  high  SNR’s. 


This  work  has  been  partially  supported  by  CNPq  un¬ 
der  Grant  301045/92-5  and  CPqD-Telebras  under  contract 
387/90. 
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Abstract  —  In  this  paper,  new  block  coded  8-PSK 
modulations  with  unequal  error  protection  (UEP)  ca¬ 
pabilities  for  Rayleigh  fading  channels  are  presented. 
The  proposed  codes  are  based  on  the  multilevel  con¬ 
struction  of  Imai  and  Hirakawa  [1].  It  is  shown  that 
the  use  of  linear  UEP  (LUEP)  codes  [2]  as  component 
codes  in  one  or  more  of  the  encoding  levels  provides 
increased  error  performance  with  respect  to  conven¬ 
tional  multilevel  codes. 

I.  Summary 

Previous  work  on  combining  LUEP  codes  and  PSK  modu¬ 
lation  for  fading  channels  is  reported  in  references  [3]  and  [4]. 
Hagenauer  et  al.  [3]  proposed  rate- compatible  punctured  con¬ 
volutional  codes  combined  with  DQPSK  modulation  to  pro¬ 
vide  UEP  by  means  of  their  variable  rate  structure.  Refer¬ 
ence  [4]  used  Gray  labeling  of  a  QPSK  signal  set  to  map  LUEP 
codes  of  even  length  onto  block  modulation  codes  with  UEP 
capabilities.  Seshadri  and  Sundberg  [5]  studied  the  UEP  ca¬ 
pabilities  of  multilevel  codes  of  length  8  over  Rayleigh  fading 
channels.  The  aim  of  this  research  work  is  to  design  efficient 
block  coded  modulations  (BCM)  over  8-PSK  signal  sets  for  the 
specific  purpose  of  UEP  over  Rayleigh  fading  channels.  Over  a 
fading  channel,  the  minimum  symbol  and  product  distances  axe 
the  parameters  that  dominate  the  overall  error  performance. 
The  symbol  distance  is  closely  related  to  the  Hamming  dis¬ 
tance  of  the  component  codes.  Thus  it  is  natural  to  consider 
binary  LUEP  codes  a s  component  codes  in  the  multilevel  con¬ 
struction  to  obtain  good  BCM  for  UEP  over  fading  channels. 

Let  S  represent  a  unit- energy  8-PSK  signal  set.  A  la¬ 
bel  £k  =  bi  +  262  +  463  represents  the  signal  point  ejffc7r/4,  for 
0  <  k  <  8,  where  j  =  \/— T,  and  6,  £  {0, 1},  1  <  i  <  3.  In  mul¬ 
tilevel  block  coded  modulation  [1],  codewords  of  three  linear 
binary  block  codes  of  length  n,  dimension  k{  and  minimum 
distance  d;,  denoted  Ci,  are  used  to  select  label  bits  bi ,  for 
1  £  *  <  3.  The  set  of  resulting  sequences  of  n  8-PSK  signals 
is  said  to  be  a  block  modulation  code  A  of  length  n  and  rate 
R  —  (ki  +  &2  +  fcO/rc.  bits/symbol. 

A  two-level  (n,  k)  LUEP  code  is  a  linear  code  that  it 
is  not  spanned  by  its  set  of  minimum  weight  vectors.  We 
use  UEP(n,  k)  to  denote  such  a  code  and  refer  to  its  un¬ 
equal  error  protection  capabilities  as  follows:  separation  vector 
s  =  (^1,52)  for  the  message  space  {0, 1}*(1)  x  {0, 1}*(2) ,  where 
k  =  +  k^2\  This  means  that  codewords  in  correspondence 

to  k information  bits  are  at  a  Hamming  distance  at  least  Si , 
i  —  1,2.  Without  loss  of  generality,  it  is  assumed  that  si  >  S2. 
Thus  an  information  vector  of  length  k  bits  can  be  separated 

1This  work  was  supported  in  part  by  NASA  under  grant  NAG 
5-931,  by  the  NSF  under  grants  NCR-88813480  and  NCR-9115400, 
and  by  the  Japanese  Society  for  the  Promotion  of  Science  (JSPS) 
under  fellowship  no.  93157. 


into  a  most  significant  part  of  length  k ^  bits  (the  MSB)  and 
a  least  significant  part  of  length  k ^  bits  (the  LSB).  The  pro¬ 
posed  multilevel  construction  uses  an  (n,  &2 ,  ch )  linear  code, 
or  a  UEP(n,&2)  code,  C2  in  the  second  encoding  level  and  a 
UEP(n,  63)  code  C3  in  the  third  encoding  level. 

As  an  example,  let  Ci,  C2  and  C3  be  (8,4,4),  (8,7,2)  and 
(8,  7,  2)  linear  codes,  respectively.  The  Imai-Hirakawa  multi¬ 
level  construction  results  in  a  block  modulation  code  of 
length  8,  rate  R  —  2.25  bits/symbol,  minimum  symbol  dis¬ 
tance  8h  =  2  and  minimum  product  distance  A^  =  4.  [5]. 
By  letting  C3  be  a  binary  optimal  LUEP  code,  UEP(8, 5), 
from  [6]  with  separation  vector  s  =  (3, 2)  for  the  message  space 
{0,  l}4  x  {0, 1},  a  block  modulation  code  A2  is  obtained.  A2 
has  length  8,  rate  R  =  2  bits/symbol,  8h  =  2  and  A 2  =  4.  In 
addition,  25%  of  the  information  bits  (the  4  MSB  encoded 
by  UEP(8,  5))  have  corresponding  symbol  and  product  dis¬ 
tances  equal  to  3  and  64,  respectively.  That  is,  a  subset  of  the 
coded  sequences,  those  corresponding  to  the  MSB  encoded  by 
the  LUEP  code,  has  increased  symbol  and  product  distances. 
It  follows  that,  with  no  bandwidth  expansion  over  uncoded 
QPSK,  higher  error  performance  is  achieved.  In  the  presenta¬ 
tion,  simulation  results  will  be  presented  showing  an  increase 
in  both  overall  coding  gain  and  that  for  the  most  important 
message  part.  At  a  bit  error  rate  (BER)  of  10~3,  the  coding 
gain  in  the  third  level  is  at  least  13  dB  for  A2,  compared 
to  about  8.5  dB  for  Aj.  In  addition,  at  a  BER  of  10-3,  an 
advantage  of  2  dB  in  overall  coding  gain  is  achieved. 

References 

[1]  H.  Imai  and  S.  Hirakawa,  “A  New  Multilevel  Coding  Method 
Using  Error  Correcting  Codes,”  IEEE  Trans.  Info.  Theory ,  vol. 
IT-23,  no.  3,  pp.  371-376,  May  1977. 

[2]  B.  Masnick  and  J.  Wolf,  “On  Linear  Unequal  Error  Protection 
Codes,”  IEEE  Trans.  Info.  Theory ,  vol.  IT- 13,  no.  4,  pp.  600- 
607,  Oct.  1967. 

[3]  J.  Hagenauer,  N.  Seshadri  and  C.-E.  W.  Sundberg,  “The  Per¬ 
formance  of  Rate- Compatible  Punctured  Convolutional  Codes 
for  Digital  Mobile  Radio,”  IEEE  Trans.  Communications ,  vol. 
38,  no.  7,  pp.  966-980,  July  1990. 

[4]  R.H.  Morelos-Zaragoza  and  S.  Lin,  “Block  QPSK  Modulation 
Codes  With  Two  Levels  of  Error  Protection,”  Proceedings  of 
the  Fifth  IEEE  International  Symposium  on  Personal,  Indoor 
and  Mobile  Communications  (PIMRC,94)<!  vol.  II,  pp.  548-552, 
The  Hague,  The  Netherlands,  Sept.  1994. 

[5]  N.  Seshadri  and  C.-E.  W.  Sundberg,  “Coded  Modulation  with 
Time  Diversity,  Unequal  Error  Protection,  and  Low  Delay  for 
the  Rayleigh  Fading  Channel,”  1st.  Conference  on  Universal 
Personal  Communications  (ICUPC  ’92),  Conf.  Rec.,  pp.  283- 
287,  Dallas,  Texas,  Sept.  1992. 

[6]  W.J.  Van  Gils,  “Two  Topics  on  Linear  Unequal  Error  Protec¬ 
tion  Codes:  Bounds  on  Their  Length  and  Cyclic  Code  Classes,” 
IEEE  Trans.  Info.  Theory ,  vol.  IT-29,  no.  6,  pp.  866-876,  Nov. 
1983. 


154 


A  Change-Detection  Approach  to  Monitoring 
Fading  Channel  Bandwidth 

Steven  D.  Blostein1  and  Yong  Liu 

Dept.  Elect.  &  Comp.  Eng.  Queen’s  University, 

Kingston,  Ontario,  Canada  K7L  3N6  Email:  sdb@ee.queensu.ca 


Abstract  —  The  statistics  of  mobile  communications 
over  frequency  nonselective  fading  channels  are  deter¬ 
mined  largely  by  fading  bandwidth,  which  is  related 
to  vehicle  speed.  On-line  estimation  of  fading  band¬ 
width  can  be  used  to  optimize  coherent  signal  trans¬ 
mission,  as  well  as  improve  handoff  algorithms.  In  the 
following,  level  crossing  rates  of  received  signal  ampli¬ 
tude,  combined  with  recent  on-line  change  detection 
techniques  [1]  are  used  to  estimate  fading  bandwidth. 
The  proposed  estimator  takes  AGWN  into  account 
and  has  low  complexity  and  processing  delay.  Ap¬ 
plied  to  adaptive  on-line  tracking  of  fast  fading  chan¬ 
nel  parameters  as  performed  in  [2],  significant  BER 
reduction  is  demonstrated,  particularly  in  situations 
where  vehicle  speed  increases  abruptly. 


I.  The  Model 

Signal  Xk  is  transmitted  over  a  frequency-nonselective 
Rician  fading  channel.  The  received  low-pass  equivalent 
discrete-time  signal  is  yk  =  XkCk  +  nfc  where  Ck  is  the  channel 
gain.  Let  mean  a  —  ^{c^}  and  covariance  function 

rn=r0M2-KfmnT)  =  r0±-  f  ej2,r/mTlTsm0d0  (1) 

J  — 7T 

where  Jo()  is  the  Oth  order  Bessel  function,  T  is  the  symbol 
period  and  fm  =  ~is  the  maximum  Doppler  frequency  (fading 
channel  bandwidth)  with  v  and  A  defined  as  mobile  vehicle 
speed  and  transmission  wavelength,  respectively. 


II.  Monitoring  Vehicle  Speed  by  Measuring 
Level  Crossing  Rate 

In  [3],  it  is  shown  that  the  number  of  crossings  at  voltage 
level,  A,  is 


n(\ck  —  a|  =  A)  =  n(R)  —  y/2n fmRe  R  (2) 

and  the  average  fade  duration  is 

t(\ck  —  a\=  A)  =  t(R)  =  _L_i(e*2  -  1)  (3) 

v27T  fm  R 

with  R  =  -^=,  and  ro  =  Var(ck)  =  E(c\)  ~  \a\2  *  Therefore 
measuring  the  level  crossing  rate  yields  an  estimate  of  v  and 
fm.  By  choosing  R  —  —  5  dB,  i.e.,  A  =  0.5623^/r^  as  in  [3], 


v 


1.6579A 

27 vt 


6.1154—. 

27T 


(4) 
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III.  Measuring  Level  Crossing  Rate  Using 
Change  Detectors 

Since  yk  contains  noise,  rather  than  count  level  crossings 
directly,  we  view  the  problem  as  a  sequential  change  detector 
as  follows:  Letting  Zi  denote  the  power  of  yi,  it  can  be  shown 
that  for  reasonably  large  SNR 

Zi  m  \ci\2  +  c*x*nt  -f  CiXiTii  (5) 


Since  Ci  has  small  bandwidth  relative  to  rii,  ci  can  be  treated 
as  locally  deterministic.  Conditioned  on  it  can  be  shown 
that  zi  is  Gaussian  with  mean  E{zi }  =  |c*|2  and  variance 

Var{zi}  =  2|ci|2<r2  (6) 


where  a2  is  the  variance  of  n;.  The  channel  energy  |c;|2 
equals  Aq  =  ro  +  |a|2  and  nominally  and  drops  to  below 
A2  =  0.56232ro  4-  |a|2  during  a  fade.  ^From  the  above,  the 
problem  can  be  transformed  into  one  of  quickest  detection  of 
a  change  from 


Ho  : 

H\  : 


~  fo(zi)  = 


1 

yf  47 vAl<rl 


q)2 


Zi  ~  fl(zi)fl(zi) 


1 


0-2 


and  vice-versa.  We  have  investigated  a  two-sided  Page’s 
cumulative-sum  (CUSUM)  statistic,  as  well  as  an  alterna¬ 
tive  change-detection  procedure  [1]  that  is  well-suited  to  cases 
where  the  change-time  is  known  to  be  finite.  The  resulting 
fading  bandwidth  estiamtor  has  been  applied  to  the  adaptive 
fading  channel  tracker,  DFALP,  described  in  [2]  as  follows: 
the  optimal  DFALP  linear  prediction  and  LPF  filter  param¬ 
eters  are  first  recorded  off-line  for  a  set  of  fading  bandwidths 
in  constant  speed  conditions.  In  on-line  use,  the  DFALP  pa¬ 
rameters  are  then  adjusted  adaptively  in  steps  of  20  km/hour. 
The  BER  performance  of  differential  quadrature  phase-shift 
keying  DQPSK  detection  is  used  as  a  reference.  From  simula¬ 
tions,  it  is  shown  that  when  the  vehicle  speed  increases  from 
60  to  100  km/hour  (in  Rician  fading  with  a2 /ro  =  4 dB),  a  5 
dB  gain  is  observed  over  both  DQPSK  and  DFALP  (without 
parameter  adaptation)  at  BER  6.30  x  10~3  and  SNR  =  20  dB. 
Noticeable  gains  are  also  observed  if  the  SNR  is  greater  than 
12  dB. 
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Abstract —  In  this  paper  we  consider  impulse  re¬ 
sponse  statistics  of  the  wide  sense  stationary  —  un¬ 
correlated  scattering  (WSSUS)  multi-path  chan¬ 
nel  that  results  from  a  stationary  scattering  field 
with  either  the  transmitter  or  receiver  in  motion, 
but  not  both. 

Summary 

Let  h(r;  t)  denote  the  time  varying  impulse  response  of 
the  channel.  By  delay  uncorrelated  scattering  we  mean 

E  [h(Ta;t)*h(n;  t  +  At)}  =  2  0*(to;  At)  6(rb  -  ra). 

In  this  paper  we  derive  general  formulas  for  the  cor¬ 
relation  function  0^(r;A£)  and  the  scattering  function 
S(t ;  A)  =  f  <j>h{T\  At)e~i2lxXAt dAt  where  A  is  the  Doppler 
frequency  shift  variable. 

This  general  family  of  channels  has  been  the  subject  of 
a  great  deal  of  research.  The  commonly  cited  result  is  the 
correlation  function  <f>h(r;  At)  oc  Jo(27rAmAt)  where  Am 
is  the  maximal  doppler  frequency,  due  to  Jakes  [2].  Here 
we  show  that,  for  arbitrary  scattering  fields,  Jakes’  result 
is  actually  just  the  Oth  term  in  the  series 

oc 

<ph(r;At)  =  2tt  V'n('r)ejn(0o+7r/2)  Jn(2n\mAt) 

n=  —  o o 


distributed  scatterers,  Jakes’  derivation  emphasizes  scat¬ 
tering  near  the  mobile  unit. 

In  addition  to  the  classical  2-D  mobile  problem  dis¬ 
cussed  above,  we  also  derive  results  for  some  3-D  prob¬ 
lems. 

Our  analysis  is  based  on  the  theory  of  generalized 
stochastic  processes  [1].  A  generalized  process  is  a  con¬ 
tinuous  random  linear  functional  on  a  topological  vector 
space  of  test  functions.  This  theory  is  a  direct  extension 
of  the  well  known  theory  of  generalized  functions  (also 
known  as  “distribution  theory”).  We  assume  a  spatially 
uncorrelated  scattering  field  with  spatial  scattering  in¬ 
tensity  function.  The  field  may  be  either  diffuse  (white 
Gaussian)  or  specular  (white  Poisson).  Our  results  are 
obtained  by  application  of  an  elliptical  change  of  vari¬ 
ables  to  the  appropriate  spatial  test  function  integral. 

The  channel  impulse  response  h(r;  t)  is  a  time-varying 
linear  transformation  of  the  spatial  scattering  field.  Thus, 
the  channel’s  2nd  order  moments  are  completely  deter¬ 
mined  by  the  spatial  scattering  field’s  2nd  order  moments 
and  the  propagation  geometry.  The  common  method  of 
analysis  first  considers  discrete  elemental  scatterers  and 
then  passes  to  a  limit  of  increasingly  dense  but  vanish¬ 
ingly  small  scatterers.  This  limiting  method  is  quite  cum¬ 
bersome,  it  is  difficult  to  identify  the  corresponding  spa¬ 
tial  scattering  intensity,  and,  in  our  analysis,  the  limiting 
method  is  entirely  unnecessary. 


where  Oq  is  the  velocity  vector  angle  relative  to  the  base- 
to-mobile  baseline.  The  series  coefficients  -0n(r)  are  de¬ 
termined  from  the  spatial  distribution  of  scatterers  and 
propagation  path  loss  factors. 

For  the  case  of  a  spatially  uniform  scattering  field  with 
1/r2  propagation  loss  factors,  we  obtain 


i’ni'r)  = 


2ccj)p 


cr((cr)2  +  rl)  y/T^aJjY 

x,  l  <T) 


1  +  yj  1  -  a(r)2 


References 

[1]  L.  Arnold,  Stochastic  Differential  Equations:  Theory 
and  Application .  New  York:  Wiley,  1974. 

[2]  W.  C.  Jakes,  Microwave  Mobile  Communications . 
New  York:  Wiley,  1974. 


where  <j>p  is  the  spatial  scattering  intensity,  ro  is  the  base- 
to- mobile  baseline  length,  c  is  the  speed  of  propagation, 
and  a(r)  =  2crr0/[(cr)2  4*  r%].  As  opposed  to  uniformly 


156 
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Abstract  —  We  show  that  particular  versions  of 
the  densest  lattice  packings  present  very  good  per¬ 
formance  over  the  Rayleigh  fading  channel.  These 
versions  not  only  have  high  diversity,  but  also  may  be 
decoded  efficiently  since  they  are  binary  lattices. 

I.  Introduction 

The  practical  interest  in  lattice  constellations  presenting  good 
performance  over  fading  channels  rises  from  the  need  to  trans¬ 
mit  information  at  high  bit  rates  over  terrestrial  radiomobile 
links.  Constellations  matched  to  the  fading  channel  are  effec¬ 
tive  because  of  their  high  degree  of  diversity.  By  diversity  we 
intend  the  number  of  different  component  values  of  any  two 
distinct  points  in  the  constellation.  The  signal  constellations 
for  Gaussian  channels  are  usually  very  bad  when  used  over 
Rayleigh  fading  channels  since  they  have  small  diversity.  We 
constructed  signal  constellations  with  high  spectral  efficiency 
matched  to  the  Rayleigh  fading  channel  using  algebraic  num¬ 
ber  theory  [l].  The  signal  constellations  are  derived  from  the 
densest  lattices  (7)4,  E&,  Es>  K12,  Ai6,  A24)  and  their  diversity 
order  is  half  the  lattice  dimension. 

II.  System  model 

Consider  the  following  model.  A  mapper  associates  an  m- 
uple  of  input  bits  to  a  signal  point  x  =  in  the 

n-dimensional  Euclidean  space  R71.  Let  M  =  2m  be  the  total 
number  of  points  in  the  constellation.  The  points  are  trans¬ 
mitted  over  a  Rayleigh  channel  giving  r  =  a*x-f  n,  where  r  is 
the  received  point,  n  =  ( n\ ,  n2 , . . .  nn)  is  a  noise  vector,  whose 
real  components  ni  are  zero  mean,  No  variance  Gaussian  dis¬ 
tributed  independent  random  variables,  a  =  (ai ,  a2, . . .  an) 
are  the  independent  random  fading  coefficients  with  unit  sec¬ 
ond  moment  and  *  represents  the  componentwise  product. 

The  signal  points  x  are  chosen  from  a  constellation  which  is 
carved  from  a  lattice  A.  The  spectral  efficiency  is  measured  in 
number  of  bits  per  two  dimensions  s  =  2m/n,  and  the  signal- 
to-noise  ratio  per  bit  is  given  by  SNR  =  Eb/No,  where  Eh 
is  the  narrow  band  average  energy  per  bit  and  No/ 2  is  the 
narrow  band  noise  power  spectral  density. 

III.  New  constellations 

An  accurate  analysis  of  the  symbol  error  probability  shows 
that  the  most  important  feature  of  a  good  constellation  for  the 
fading  channel  is  its  diversity  L.  The  following  theorem  en¬ 
ables  us  to  evaluate  the  diversity  L  of  any  lattice  constructed 
from  an  algebraic  number  field. 

Theorem.  The  lattices  obtained  from  the  canonical  embed¬ 
ding  of  an  algebraic  number  field  with  signature  (r4 ,  r2)  exhibit 
a  diversity  L  =  r\  +  r2. 

Since  totally  complex  cyclotomic  fields  have  a  signature 
(0,n/2)  the  diversity  of  the  corresponding  lattices  is  L  =  n/ 2. 
We  use  Craig’s  work  [2,  3],  who  showed  how  to  construct  the 
lattices  i?6, 7?8,A24  (Leech  lattice)  from  the  totally  complex 
cyclotomic  fields  K  =  Q(e,2lr^)  for  N  =  9,20,39.  We  ap¬ 
plied  the  same  procedure  and  we  found  7)4  (Schlafli  lattice), 


K\2  (Coxeter-Todd’s  lattice)  and  Ai6  (Barnes- Wall’s  lattice) 
from  the  8th,21s<  and  the  40th  root  of  unity.  These  lattices 
are  obtained  by  applying  the  canonical  embedding  to  partic¬ 
ular  integral  ideals  of  the  above  cyclotomic  fields.  The  ideals 
are  given  in  the  table  below.  The  lattices  are  indicated  with 
A n,L-  Two  generators  for  each  ideal  are  given  in  the  last  col¬ 
umn. 


Lattice 

N 

Ideal 

7^4, 2 

8 

(2,0  +  1) 

7^6 ,3 

9 

(3,(0  + l)2) 

7^8,4 

20 

(5,0-2) 

7£l2, 6 

21 

(7,0  +  3) 

Ai6,-8 

40 

(2, 04  +  03  +  02  +  0  +  1)(5,02  +  2) 

A24,i2 

39 

(3, 03  +  02  -  1 )( 1 3 , 0  —  3) 

(3, 03  +02  +  0  +  1) 

IV.  Results 

The  figure  below  shows  the  performance  over  the  Rayleigh 
fading  channel  of  the  rotated  versions  of  the  lattices 
774,  Eg,  E&,  K12  and  Aie-  Simulations  were  made  up  to  di¬ 
mension  8,  while  for  higher  dimensions  we  have  plotted  upper 
bounds.  The  bit  error  probability  is  given  as  a  function  of 
Eb/No  for  3  =  4  bits/symbol.  The  slopes  of  the  curves  asym- 
potically  correspond  to  the  diversity  order  which  is  2,  3,  4,  6 
and  8  respectively.  At  10  “3  the  gain  over  Z 8  is  about  17dB 
and  it  exceeds  25dB  at  10“5.  It  is  important  to  notice  that 


Eb/N0  (dB) 

the  maximal  diversity  reached  with  a  reasonable  trellis  coded 
modulation  does  not  exceed  6.  The  diversity  of  the  rotated 
Leech  lattice  A24)i2  is  12.  This  is  equivalent  to  a  trellis  coded 
QAM  with  244  states  or  a  trellis  coded  PAM  with  222  states 
at  4  bits  per  symbol. 
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Abstract  — 

The  utilization  of  real-number  DFT  codes 
for  a  multiplicative  channel  is  introduced  in 
this  paper.  By  the  proposed  encoding  proce¬ 
dure,  some  redundancies  can  be  added  into 
the  transmitted  data.  With  these  redun¬ 
dancies,  syndromes  for  the  parameters  of  a 
fading  channel  can  be  obtained  from  the  re¬ 
ceived  data.  The  decoding  algorithm  for  real- 
number  DFT  codes  can  be  used  to  calculate 
the  fading  parameters  with  these  syndromes. 

I.  Introduction 

In  1981,  Marshall  first  defined  error  control 
codes  for  real  or  complex  data  and  suggested  that 
real-number  codes  could  have  applications  simi¬ 
lar  to  those  of  Reed-Solomon  codes.  Wolf,  with  a 
different  view,  took  real-number  codes  as  a  new 
technique  for  solving  signal  processing  problems 
such  as  impulsive  noise  cancellation  in  informa¬ 
tion  transmission. 

A  common  feature  of  previous  studies  is  that 
the  channel  error  model  is  assumed  to  be  addi¬ 
tive.  In  this  paper,  the  real-number  decoding 
method  for  multiplicative  channel  error  model 
(which  corresponds  to  the  situation  of  transmit¬ 
ting  over  a  fading  channel  in  practice)  will  be 
investigated. 

II.  Encoding  and  Decoding  Scheme  for 
a  Fading  Channel 

Usually  the  effect  of  a  fading  channel  is  mod¬ 
eled  by  a  slowly  varying  component  multiplying 
the  transmitted  signal,  that  is 

G’  =  Vi  -  et  +  Hi  (1) 

where  yt-  is  the  transmitted  signal,  e,-  the  multi¬ 
plicative  parameters  of  a  fading  channel,  nl  the 
background  noise,  and  r,-  the  received  signal.  In 
a  block  coding  scheme,  we  can  also  assume  that 
the  index  i  is  in  the  range  of  0, 1, 2, . . . ,  N  —  1. 


A  multiplication  can  be  transformed  into  an 
addition  by  taking  logarithm.  However,  since 
the  signals  under  consideration  are  assumed  to 
be  complex,  complex  logarithm  function  are  re¬ 
quired.  It  can  be  easily  derived  from  eqn.  (1) 
that 

logc  n  =  logc  y{  +  logc  e,  +  hi  (2) 

where  hi  —  logTl  +  -Jh-).  It  should  noted  that 

v  Vt  'pi' 

when  rii  <C  Vi  •  e,,  hi  will  approach  0.  Since  e;  is 
slowly  varying,  both  e,  and  logc  e,  can  be  viewed 
as  a  lowpass  signal.  Therefore,  it  is  reasonable 
to  assume  that  logc  e,  can  be  obtained  from  the 
sum  of  some  unknown  low  frequency  components 
Ek,  that  is 

log.e,-  (3) 

1=1 

where  ki  is  the  location  for  a  nonzero  frequency 
components,  and  E ^  is  the  magnitude  of  that 
component.  Now  suppose  that  y,;  is  encoded  as 

1  i  =  0,l,...,N  -  K  -1 
Xi  i  =  N  —  K,  N  —  K  +  1, . . . ,  N  —  1 

The  first  N  —  K  equations  in  eqn.  (2)  become  the 
desired  syndromes 

Si  =  logc  r{  =  logc  ei  +  hi  (5) 

These  noisy  syndromes  can  readily  be  input  to 
some  decoding  algorithms  for  the  DFT  codes, 
to  compute  Ek ,  provided  that  the  number  of 
nonzero  terms  of  Ek  in  eqn.  (5).  After  Ek  is  com¬ 
puted,  an  estimation  of  y,  can  then  be  derived. 

In  this  way,  one  can  estimate  out  the  channel 
parameters  e,-  and,  at  the  same  time,  the  trans¬ 
mitted  data  x,. 
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Abstract  —  This  paper  studies  the  construction  of 
good  time-varying  convolutional  codes  from  the  re¬ 
lation  between  time-varying  and  time  invariant  en¬ 
codes. 


where  g  *ifj  (D)  -  g  4$  *]j  D  +  g  *2j  D2  4  *  *  ■. 

The  k/(k  4  1)  time- varying  and  time- invariant  convolu 
tional  encoders  are  equivalent  if  g*=  g^p,  g*  =  9Vz,k+i  w^ere 


I.  Introduction 

In  this  study,  we  discuss  construction  of  good  time-varying 
convolutional  encoders,  i.e.  punctured  convolutional  codes, 
from  time-invariant  convolutional  encoder. 

In  general,  an  k/n  time- varying  convolutional  code  can  be 
described  by  p  n  X  k  generator  matrices,  where  p  is  the  pe¬ 
riod  of  time  varying.  Such  an  k/n  time-varying  convolutional 
code  may  have  better  error  protection  capability  than  the  best 
convolutional  codes  which  has  time-invariant  encoder  with  the 
same  number  of  states. 

The  k/k  4  1  time- varying  convolutional  encoders,  which 
are  discussed  here,  are  realized  as  k/(k  4  1)  punctured  con¬ 
volutional  codes  from  l/(k  4  1)  convolutional  codes.  Any  of 
the  encoders  may  not  have  better  error  protection  capability 
than  the  best  time-invariant  convolutional  codes,  since  such 
a  k/(k  -f  1)  time- varying  convolutional  encoder  can  translate 
into  time- invariant  encoder  with  the  same  or  less  number  of 
states. 

However  such  a  time- varying  convolutional  encoder  has 
benefit  in  applications.  The  Viterbi  decoder  of  a  time-varying 
convolutional  encoder  has  smaller  complexity  than  an  ordinal 
one. 

The  time-varying  convolutional  encoder  have  been  inde¬ 
pendently  studied  from  time-invariant  codes.  We  show  trans¬ 
formation  k/(k  4  1)  time- varying  convolutional  encoders  from 
equivalent  k/( k  +  1)  time-invariant  encoders,  and  discuss  good 
time- varying  convolutional  encoder.  Some  of  the  time-varying 
convolutional  encoder  which  derived  from  the  best  known 
k/(k  +  l)  time-invariant  codes,  has  the  same  free  distance 
as  the  known  time-varying/punctured  encodes. 


x 


V 

z 


1 

1 


0 

9*3* 

0 

9*j,k  + 1 


~  ky-z+p, 

-  x  4  k  -  p, 

=:  [(k  —  *)  mod  k\  4*p, 

<  z  <  k, 

<  p  <  k, 

—  0  (for  p  <  j  <  k), 
=  0  (for  i  <  j  <  k), 


(1) 

(2) 


are  satisfied,  proof  and  detail  discussions  are  omitted  here. 

Most  of  best  time- in  variant  convolutional  codes  satisfy  the 
conditions  (1)  and  (2)  with  permutation  the  inputs  and  out¬ 
puts.  Let  us  consider  a  2M-states  time- invariant  convolutional 
encoder  which  satisfies  the  conditions.  The  encoder  is  trans¬ 
lated  into  2m,  2m+1  •••or  2Af+p-states  time-varying  con¬ 
volutional  encoder.  The  exact  number  of  states  is  given  by 
the  discussion  from  the  equations  above  (omitted  here).  We 
can  easily  find  2m  or  2M+1 -states  time- varying  convolutional 
encoder  from  best  or  good  2M-states  time -invariant  convolu¬ 
tional  encoder,  where  we  call  good  code  which  has  maximum 
free  distance  for  given  number  of  states. 


III.  Examples 

Here  we  show  the  two  examples  of  transformation.  The 
best  32-state  2/3  time -invariant  convolutional  code 

(  1  +  D  D  +  D2  l  +  D  +  D2  \ 
y  D7  1  1  4  D  4  D2  4  Dz  J  ’ 

is  translated  as  64-state  2/3  time  varying  code; 


II.  Transformation  between  time-varying 

AND  TIME-INVARIANT  CONVOLUTIONAL  ENCODES 
The  fc/(Jb4l)  time-varying  convolutional  encoder  discussed 

here  (i.e.  k/(k  41)  punctured  code)  has  period  k,  k  generator 
matrices,  only  one  matrix  is  1  X  2  and  other  k  —  1  matrices 
are  lxl.  If  1  X  2  generator  matrix  is  used  in  t-th  interval, 
the  k/(k  41)  time- varying  convolutional  encoder  can  be  wrote 
with  k  4  1  polynomials  as 

{<7i  (D),  g2(D),  (gi(D)  go(D)),  *+i(I>),  gk(D)} , 

where  gj(D)  =  g*/  4  g)D  4  g}D2  4  ***•  Let  the  generator 
matrix  of  corresponding  Ar /  (Ab  4 1 )  time- invariant  convolutional 
encoder  as 


<7*1,1 

g*  1,2 

••  9*  i,fc+i 

9*2,1 

<7*2,2 

*■  £*2,k  +  l 

g*k,i 

9*ky  2 

•*  9*k,k+ 1 

{l+D2  +  D\  (14D34D5  14D4D24D34D44D54D6)}. 

the  best  64- state  3/4  time- invariant  convolutional  code 

D  4  D2  1  D2  14  D 

D  +  D4P2  1  D  +  D2  14  D2 

14D4D2  14D  14  D  1 

is  translated  as  64- state  2/3  time  varying  code 

{14D  4D2  4D3  44  4D6,  D  +  D2 +D* +  D* +  D\ 

(1  4  D2  4  D4  4  D5  4  D6  4  D7  1  4  D  4  D2  4  D3  4  D7)}  . 

The  use  of  the  conditions  (l)  and  (2)  with  free  distance 
bound  for  convolutional  codes  gives  an  efficient  algorithm  to 
find  good  time- varying  convolutional  codes. 
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Abstract  —  The  construction  and  performance  of 
cascaded  convolutional  codes  is  investigated.  An  in¬ 
terleaver  is  used  between  the  inner  and  outer  codes 
to  redistribute  errors  out  of  the  inner  decoder.  In 
addition,  the  structure  of  the  interleaver  is  exploited 
to  improve  the  distance  properties  of  the  overall  cas¬ 
caded  code.  This  configuration  is  shown  to  have  a 
performance  advantage  compared  to  a  single  complex 
convolutional  code  with  the  same  rate  and  decoder 
complexity. 


I.  Introduction 

In  this  paper,  the  design  and  performance  of  cascaded  con¬ 
volutional  codes  [1]  for  the  additive  white  Gaussian  noise  chan¬ 
nel  is  investigated.  A  cascaded  convolutional  code  is  the  se¬ 
rial  concatenation  of  two  binary  convolutional  codes.  They 
are  decoded  using  the  serial  concatenation  of  the  decoders 
corresponding  to  the  two  convolutional  codes.  In  order  to  re¬ 
alize  the  full  performance  potential  of  cascaded  convolutional 
codes,  it  is  necessary  to  pass  soft  information  from  the  inner 
decoder  to  the  outer  decoder  [2].  In  this  work,  the  maximum 
a  posteriori  (MAP)  algorithm  developed  and  described  in  [3] 
is  used  to  decode  the  inner  code. 


II.  A  Simple  Example 

A  block  diagram  of  a  simple  cascaded  convolutional  coding 
scheme  is  shown  in  Figure  1.  The  outer  convolutional  code  is 
a  maximal  free  distance  (MFD)  rate  £1/7*1  =  2/3  code  with 
total  encoder  memory  v\  —  3  and  free  distance  d/Teei  =  4 
The  generator  matrix  of  this  (3,  2,  3)  code  in  nonsystematic 
feedforward  form  is  given  by 


Gi  (D)  — 


1  D  HD 
D2  1  l  +  D  +  D2 


Fig.  1 :  Block  diagram  of  a  cascaded  convolutional  coding  scheme. 


and  the  resulting  code  is  called  the  composite  code .  The  gen¬ 
erator  matrix  G(D)  realizes  a  (4,2,  7)  code  with  d/ree  =  6. 

Note  that  the  constraint  length  of  the  composite  code  is 
greater  than  the  sum  of  the  constraint  lengths  of  the  compo¬ 
nent  codes.  (It  may  be  that  the  (4,  2,  7)  code  is  not  in  minimal 
form.)  Unlike  concatenated  block  codes  and  product  codes, 
the  free  distance  of  the  overall  code  is  not  the  product  of  the 
free  distances  of  the  component  codes.  (Thus,  using  MFD 
codes  for  the  component  codes  is  not  necessarily  optimal.) 
However,  by  carefully  designing  the  interleaver,  the  free  dis¬ 
tance  of  the  cascaded  code  may  be  increased  and  made  larger 
than  that  of  a  single  convolutional  code  of  the  same  complex¬ 
ity.  In  addition,  cascaded  convolutional  codes  tend  to  have 
less  dense  distance  spectra  than  a  comparable  single  code.  As 
the  code  complexity  increases,  the  sparse  distance  spectra  of 
cascaded  convolutional  codes  improves  their  performance  at 
low  and  moderate  signal  to  noise  ratios. 


The  inner  code  is  a  maximal  free  distance  rate  £2/7*2  =3/4 
code  with  total  encoder  memory  v\  =  3  and  free  distance 
dfree2  =  4.  The  generator  matrix  of  this  (4,3,3)  code  in 
nonsystematic  feedforward  form  is  given  by 


'  1 

1 

1 

1 

g2(D)  = 

0 

1  +  D 

D 

1 

.  0 

D 

1  +  D2 

1  +  D2 

The  inner  and  outer  convolutional  codes  will  be  referred  to  as 
the  component  codes.  The  overall  cascaded  code  has  rate 

R=  —  x  —  =  -  = 

7*1  7*2  4  2 

If  the  generator  matrices  of  the  two  codes  in  this  exam¬ 
ple  are  multiplied,  ignoring  the  effect  of  the  interleaver,  the 
resulting  generator  matrix,  G(D),  is  given  by 

1  1  D  +  D3  D2  +  D3 

D2  1  +  D3  1  +  D2  +  D3  +  D4  D  +  D2  +  D3  +  Di 


III.  CONCLUSION 

Cascaded  convolutional  codes  appear  to  be  a  reasonable  alter¬ 
native  to  complex  convolutional  codes.  The  combination  of 
soft-output  decoding  and  interleaving  enables  cascaded  codes 
to  outperform  a  single  code  of  the  same  complexity.  Cascaded 
convolutional  codes  also  lend  themselves  to  a  form  of  iterative 
decoding  [4]. 
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Abstract  —  An  algorithm  is  presented  to  identify 
catastrophic  encoders  when  the  original  rate  1/6  en¬ 
coder  is  antipodal.  The  key  technique  is  to  use  the 
syndrome  former  to  determine  the  constraint  length 
of  the  dual  code.  The  major  part  of  the  algorithm 
solves  a  linear  equation  of  v  -f  1  variables,  where  v  is 
the  constraint  length  of  the  original  rate  1/6  code. 

I.  Introduction 

Both  Viterbi  decoding  find  sequential  decoding  of  high-rate 
convolutional  codes  are  greatly  simplified  by  employing  the 
class  of  punctured  convolutional  codes,  which  are  obtained  by 
periodically  deleting  a  part  of  the  bits  of  a  low-rate  code.  The 
simple  structure  of  the  low-rate  code  can  be  utilized  to  encode 
and  decode  the  high-rate  code. 

Good  punctured  convolutional  codes  are  generally  obtained 
by  computer  searches.  During  the  searching  procedure,  catas¬ 
trophic  encoders,  which  result  in  infinite  number  of  decoded 
errors  from  finite  channel  errors,  must  be  identified.  This  ap¬ 
pears  to  be  a  nontrivial  problem  since  some  deleting  maps 
may  result  in  catastrophic  encoders  even  if  the  original  code 
is  noncat  as  trophic.  Therefore,  in  order  to  speed  up  the  search 
for  good  punctured  codes,  an  efficient  algorithm  to  identify 
catastrophic  encoders  is  highly  desirable. 

In  this  work,  we  propose  an  algorithm  to  identify  catas¬ 
trophic  encoders  of  rate  (n  —  l)/n  punctured  codes  when  the 
original  encoder  is  antipodal.  The  algorithm  is  computation¬ 
ally  efficient  for  both  large  and  small  constraint  lengthes. 

From  [2],  we  know  that  a  punctured  convolutional  encoder 
obtained  from  an  antipodal  encoder  is  noncatastrophic  if  and 
only  if  it  is  minimum.  The  algorithm  to  be  presented  first  finds 
a  nonzero  codeword  of  the  dual  of  the  punctured  rate  (n  — l)/n 
code.  Since  the  dual  code  is  a  rate  1/n  code,  its  minimum  en¬ 
coder  can  be  easily  found  from  any  nonzero  codeword.  Thus, 
the  overall  constraint  length  of  a  minimum  encoder  of  the  dual 
code  is  determined.  The  constraint  length  of  a  minimum  en¬ 
coder  is  always  equal  to  that  of  the  minimum  encoder  of  its 
dual.  In  this  way,  the  minimality  of  the  punctured  encoder, 
thus  the  catastrophic  property,  is  determined. 

II.  The  algorithm 

For  a  fixed  deleting  matrix  and  a  finite  weight  sequence  x, 
ext(x)  is  defined  to  be  a  sequence  with  the  property  that 
ext(x)  redueces  to  x  after  puncturing  by  applying  the  deleting 
matrix,  and  those  deleting  positions  are  equal  to  zero.  If  we 
further  define  dual  of  a  convolutional  code  as  anti-Laurant  se¬ 
quences  orthogonal  to  all  the  codewords  of  the  convolutional 
codes.  We  have  the  following  two  lemmas. 

Lemma  1  A  finite- weight  sequence  x  of  n-dimensional  vec¬ 
tors  is  in  the  dual  of  the  punctured  convolutional  code  if  and 
only  if  ext(ar)  is  in  the  dual  of  the  original  rate  1/6  code. 


Lemma  2  For  any  state  of  the  syndrome  former  of  the  dual 
of  an  antipodal  rate  1/6  convolutional  code  ,  there  exist  two 
n-dimensional  vectors,  sayx,xt,  such  that  when  the  syndrome 
former  starts  from  this  state,  the  input  ex t(x)  (ext(:r/))  causes 
the  syndrome  former  to  transfer  to  another  state  with  the  all¬ 
zero  output.  Any  one  of  the  two  vectors  can  be  found  in  no 
more  than  n(i/  +  l)  binary  operations. 

By  these  lemmas,  we  can  establish  the  following  algorithm  to 
identify  catastrophic  encoders. 

1.  Initialize  the  adjoint-obvious  realization  [4]  of 
Dl/GT(D~1)  as  the  all-zero  state. 

2.  Find 

a  sequence  of  n-dimensional  vectors  (a?i,  $2,  *  *  *  j  #H-i) 
such  that  x\  ^  0  and  (eatf(£i),  ext(x 2),  •  •  • ,  ext{xv+i)) 
is  a  valid  input  sequence  of  the  syndrome  former  with 
state  transitions  (0,  Si ,  S2,  •  •  • ,  ^+1  )• 

3.  Find  a  nontrivial  solution  (6*,  •  •  • , 6*  +  j )  of  the  equation 

H-l 

X>Si  =  o- 

»=1 

4.  Calculate  the  sum 

H-i 

t=l 

5.  Represent  y  in  n  polynomials. 

6.  If  all  the  degrees  of  the  n  polynomials  are  less  than  v 
or  the  degree  of  their  greatest  common  divisor  is  larger 
than  one,  the  punctured  convolutional  encoder  is  catas¬ 
trophic. 

END 

This  algorithm  significantly  redueces  the  computational 
complexity  of  all  known  algorithms  [1,  3]. 

References 

[1]  K.  J.  Hole,  “An  algorithm  for  determining  if  a  rate  (n  -  l)/n 
punctured  convolutional  encoder  is  catastrophic,”  IEEE  Trans¬ 
actions  on  Communications,  vol.  COM-39,  pp.  386—389,  Mar. 
1991. 

[2]  K.  J.  Hole,  “Rate  k/(k  +  1)  minimum  punctured  convolutional 
encoders,”  IEEE  Transactions  on  Information  Theory,  vol.  IT- 
37,  pp.  653-655,  May  1991. 

[3]  J.  L.  Massey  and  M.  K.  Sain,  “Inverses  of  linear  sequential 
circuits,”  IEEE  Trans.  Comput .,  vol.  C-17,  pp.  330—337,  Apr. 
1968. 

[4]  G.  D.  Forney  Jr.,  “Structural  analysis  of  convolutional  codes 
via  dual  codes,”  IEEE  Transactions  on  Information  Theory , 
vol.  IT-19,  pp.  512-518,  July  1973. 


161 


Generalized  Hamming  Weights  of  Convolutional  Codes1 

Joachim  Rosenthal2  Eric  Von  York 

Department  of  Mathematics,  University  of  Notre  Dame,  Notre  Dame,  IN  46556-5683 
e-mail:  Joachim.Rosenthal@nd.edu,  Eric.York@nd.edu 


Abstract  —  Motivated  by  applications  in  cryptology 
K.  Wei  introduced  in  1991  the  concept  of  a  generalized 
Hamming  weight  for  a  linear  block  code.  In  this  paper 
we  define  generalized  Hamming  weights  for  the  class 
of  convolutional  codes  and  we  derive  several  of  their 
basic  properties. 

I.  Introduction 

An  important  set  of  code  parameters  defined  for  a  linear 
block  code  are  the  so  called  generalized  Hamming  weights  first 
introduced  by  Wei  in  [1].  By  definition  the  r-th  generalized 
Hamming  weight  dr(C)  of  a  linear  block  code  C  is  equal  to 
the  smallest  support  of  any  r-dimensional  subcode  of  C.  In 
particular  do(C)  =  0  and  d\(C)  is  equal  to  the  distance  of  C. 

In  this  way  every  [n,  k]  linear  block  code  has  associated  a 
whole  weight  hierarchy 

0  =  d0(C)  <  dx{C)  <  •  •  •  <  dk(C)  <  n.  (1) 

In  this  correspondence  we  will  study  the  weight  hierarchy  of 
a  convolutional  code.  After  formally  introducing  this  concept 
we  will  derive  in  the  next  section  several  of  the  basic  proper¬ 
ties.  In  particular  we  will  show  that  the  generalized  Hamming 
weights  form  an  infinite  strictly  increasing  sequence  d*(C)  of 
positive  integers.  The  main  result  (Theorem  4)  is  a  general¬ 
ized  Griesmer  bound. 

II.  Definitions 

Let  F<7  be  the  Galois  field  of  q  elements,  Fg[D]  be  the 
polynomial  ring  over  Fq  and  Fq(D)  the  ring  of  rational  func¬ 
tions.  In  the  following  it  will  be  convenient  to  view  ele¬ 
ments  of  F q(D)  as  infinite  (periodic)  power  series  of  the  form 
Yl'iLo  xiD\x*  €  F q.  Let  C  be  a  rate  k/n  convolutional  code 
represented  through  a  non-cat astrophic  encoder  G(D).  With¬ 
out  loss  of  generality  we  will  assume  that  the  matrix  G(D) 
which  is  defined  over  F«j[D]  is  in  row  proper  form,  in  other 
words  we  will  assume  that  the  “high  order  coefficient  matrix” 
has  full  row  rank.  We  also  will  assume  that  G(D)  has  ordered 
row  (Kronecker)  indices 

v\  >•••>*'* 

where  the  indices  Vi  are  formally  defined  through: 

Vi  —  max{deg((/ij)  |  1  <  j  <  n}  ,  *  =  1, . . . ,  k. 

We  will  denote  the  memory,  complexity  and  constraint  length 
of  a  convolutional  code  by  m,  c,  and  g  respectively.  In  terms 
of  the  Kronecker  indices  we  have:  m  =  v\  ,  c  =  Vi  and 

V 1  =  n{v  1  +  1). 

In  an  obvious  way  we  can  view  C  also  as  an  (infinite  di¬ 
mensional)  linear  Fq  vector  space.  Let 

{«!(/>),...,  Ur(D)} 

*An  extended  version  of  this  paper  has  appeared  as  a  report: 
CWI  Report  BS-R9507,  Amsterdam,  The  Netherlands,  1995. 
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be  r  vectors  in  F J(D),  that  are  linearly  independent  over  Fq. 
Since  G(D)  has  by  assumption  linearly  independent  rows  it 
follows  that 

(D)G(D), . . . ,  ur(D)G(D)}  CCC  F  nq(D) 

defines  an  r-dimensional  subspace  of  C  and  clearly  every  r- 
dimensional  subspace  U  C  C  is  of  this  form. 

Definition  1  Let  U  C  C  be  a  linear  subspace  of  C.  Then 

x(U)  :=  {(».*)  I  3  (^2XllDl,...,J2x^D3)  €  U, Xij  0} 

is  called  the  support  of  U  and 

dr(C)  :=  min{|x(U)|  |  U  C  C  anddim  U  =  r] 

is  called  the  rth  generalized  Hamming  weight  of  C. 

Note  that  the  generalized  Hamming  weights  are  well  de¬ 
fined  for  any  positive  integer  r  and  not  just  for  r  =  0, . . . ,  k 
as  it  is  the  case  for  block  codes.  Also  note  that  if  U  is  one 
dimensional  and  u  £  U  is  any  nonzero  codeword  then  \\(U)\ 
is  nothing  else  than  the  usual  Hamming  weight  tu(u)  of  the 
codeword  u.  In  particular  it  follows  in  analogy  to  the  block 
code  case  that  d\  ( C )  is  equal  to  the  free  distance  of  C. 

III.  Basic  properties. 

Lemma  2  Let  C  be  a  convolutional  code  of  rate  £  and  mem¬ 
ory  m.  In  order  to  compute  d,(C)  it  is  enough  to  consider 
subspaces  of  the  form 

U  =  span{Ul  ( D)G(D ), . . . ,  ur(D)G(D)} 

where  Ui(D )  C  F q[D]  and  the  deg(«;(D))  <  (m2  -f  mr)n. 

The  following  Lemma  is  a  natural  generalization  of  Wei’s 
monotonicity  theorem  [1,  Theorem  1]  for  block  codes. 

Theorem  3  The  generalized  Hamming  weights  of  a  convolu¬ 
tional  code  form  a  (strictly)  increasing  set  of  positive  integers 

0  =  do(C')  <  d\  (C)  <  d2(C)  <  •  *  • 

Theorem  4  Let  C  be  a  binary  convolutional  code  of  rate 
k/n  and  having  a  basic  encoder  G(D)  with  Kronecker  in¬ 
dices  v  =  (i/i, . . . ,  vk).  Let  7  be  a  positive  integer  and  let 

K  =  max(7  —  Vi  +  1, 0).  Then  the  rth  generalized  Ham¬ 

ming  weight  of  C  satisfies 

<2> 

Example  5  Let  C  be  the  rate  |  ,  m  =  2,  rj  =  6  code  with 
generator  matrix  G(D)  —  (D2  +  D  + 1,  D2  -f  1)*  Then  one  can 
verify  that  d\{C)  =  5  and  d%{C)  —  2(i  —  1)  -f  i/,  Vi  >  1. 
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Abstract  —  In  list  decoding  (M-algorithm)  the  de¬ 
coder  state  space  is  typically  much  smaller  than  the 
encoder  state  space.  Hence,  it  can  happen  that  the 
correct  path  is  lost.  This  is  a  serious  kind  of  error 
event  that  is  typical  for  list  decoding.  In  this  paper 
two  upper  bounds  on  the  probability  of  correct  path 
loss  for  list  decoding  are  given.  For  fixed  convolu¬ 
tional  codes  counterparts  to  Viterbi’s  upper  bounds 
for  maximum-likelihood  decoding  of  fixed  convolu¬ 
tional  codes  are  proved.  Finally,  it  is  shown  that  there 
exists  a  fixed  convolutional  code  whose  probability  of 
correct  path  loss  when  decoded  by  list  decoding  sat¬ 
isfies  a  simple  expurgated  bound. 


I.  Introduction 

Viterbi  decoding  is  an  example  of  a  non-backtracking  de¬ 
coding  method  that  at  each  time  instant  examines  the  total 
encoder  state  space.  The  error  correcting  capability  of  the 
code  is  fully  exploited. 

In  list  decoding  (M-algorithm)  we  first  limit  the  resources 
of  the  decoder ,  then  we  choose  an  encoding  matrix  with  a  state 
space  that  is  larger  than  the  decoder  state  space.  Thus,  as¬ 
suming  the  same  decoder  complexity,  we  use  a  more  powerful 
code  with  list  decoding  than  with  Viterbi  decoding.  A  list 
decoder  is  a  very  powerful  non-backtracking  decoding  method 
that  does  not  fully  exploit  the  error  correcting  capability  of 
the  code. 

List  decoding  is  a  breadth-first  search  of  the  code  tree.  At 
each  depth  only  the  L  most  promising  subpaths  are  extended, 
not  all,  as  is  the  case  with  Viterbi  decoding.  These  subpaths 
form  a  list  of  size  L. 

Since  the  search  is  breadth-first,  all  subpaths  on  the  list  are 
of  the  same  length  and  finding  the  L  best  extensions  reduces 
to  choosing  the  L  extensions  with  the  largest  values  of  the 
cumulative  Viterbi  metric. 


III.  Upper  bounds  on  the  probability  of 

CORRECT  PATH  LOSS 

The  correct  path  loss  on  the  ith  step  of  a  list  decoding 
algorithm  is  a  random  event  Et  which  consists  of  deleting  at 
the  zth  step  the  correct  codeword  from  the  list  of  the  L  most 
likely  codewords. 

To  upper  bound  P(£i)  we  introduce  the  l-list  generating 
function  for  the  path  weights  Ti(D).  Consider  the  trellis  for 
a  rate  R  =  b/c  and  memory  m  fixed  convolutional  code.  At 
a  given  depth  consider  the  set  of  26m  paths  of  least  weight 
leading  to  the  2bm  states.  Order  these  paths  according  to 
increasing  weights  and  let  w3  denote  the  weight  of  the  j th 
path  (wo  =  0).  Introducing 

2bm-l 

Ti(D)=  °Wj’ 

3=1 


the  Mist  generating  function  of  the  path  weights,  we  can  prove 
the  following 

Theorem  1  For  the  BSC  with  crossover  probability  e  and 
fixed  convolutional  codes  with  l-list  generating  function  Ti(D) 
the  probability  of  correct  path  loss  is  upper  bounded  by 


P{EX)  < 


min 

1<J<L 


Ii?  =  a/4£(1-€) 

L-/+1 


□ 


For  the  Gaussian  channel  we  have  the  corresponding  bound: 


Theorem  2  For  the  channel  with  additive  white  Gaussian 
noise  (AWGN)  with  signal-to-noise  ratio  Eb/No  and  fixed  con¬ 
volutional  codes  of  rate  R  with  l-list  generating  function  Ti(D) 
the  probability  of  correct  path  loss  is  upper  bounded  by 


P(Et)  < 


min 

1  <1<L 


Tl{D)  \D  =  e-REb/N0 

L-l+1 


□ 


II.  The  CORRECT  PATH  loss  problem 

Since  only  the  L  best  extensions  are  kept  it  can  happen 
that  the  correct  path  is  lost.  This  is  a  very  severe  event  that 
causes  many  bit  errors.  If  the  decoder  cannot  recover  a  lost 
correct  path  it  is  of  course  a  “catastrophe”,  i.e.,  a  situation 
similar  to  the  catastrophic  error  propagation  that  can  occur 
when  a  catastrophic  encoding  matrix  is  used  to  encode  the 
information  sequence. 

The  list  decoder’s  ability  to  recover  a  lost  correct  path  de¬ 
pends  heavily  on  the  type  of  encoder  that  is  used.  A  system¬ 
atic  encoder  supports  a  spontaneous  recovery. 

1This  work  was  supported  in  part  by  the  Swedish  Research 
Council  for  Engineering  Sciences  under  Grants  92-661  and  94-83. 


Furthermore,  we  can  prove 

Theorem  3  There  exists  a  fixed  convolutional  code  satisfying 
the  following  expurgated  bound: 

Iog9  \/4c(l— 7) 

P(£i)  <  L  •  0(1). 

□ 
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Abstract  —  Ra te-k/n  locally  invertible  convolutional 
encoders  are  defined.  It  is  shown  that  a  basic  locally 
invertible  encoder  is  minimal-basic.  Local  invertibil- 
ity  is  used  to  re-derive  Forney’s  [1]  upper  and  lower 
bounds  on  the  maximum  number  of  consecutive  all¬ 
zero  branches  in  a  convolutional  codeword.  A  time- 
domain  test  for  minimality  [2]  of  an  encoder  is  given. 

I.  Introduction:  Time-Domain  Approach 

A  ra  te-k/n  convolutional  encoder  is  characterized  in  the 
time-domain  by  a  discrete  semi-infinite  generator  matrix  G 
[3].  Consider  a  finite  section  G[m>  m+/3]  of  G  given  by 


This  matrix  represents  a  mapping  between  a  k(ro  A  f3  -f 
l)-bit  information  subsequence  u^_mj  and  an  n(f3  -f 
l)-bit  encoded  subsequence  V[tjt+£],  given  by  V[tj  = 
t+/3]Cqmi  m+£],  where  t  >  0  is  the  time  index  and 
U[_m>  __!]  is  the  the  starting  state  of  the  encoder. 

A  time- domain  approach  for  analyzing  rat e-k/n  convolu¬ 
tional  encoders  has  recently  been  presented  in  [4,  5,  6].  This 
approach  is  based  on  performing  elementary  column  opera¬ 
tions  on  a  finite  section  G[m>  m+t,]  of  G,  corresponding  to 

+  1)  output  branches,  to  obtain  its  column  canonical  form 
Gf m>tn+I/].  A  matrix  is  in  column  canonical  form  if  1)  All 
all-zero  columns  appear  as  the  left-most  columns  of  the  ma¬ 
trix,  and  2)  The  last  nonzero  element  in  a  column  is  the  only 
nonzero  element  in  its  row,  is  a  1,  and  appears  above  the  last 
nonzero  element  in  succeeding  columns.  The  last  nonzero  ele¬ 
ment  in  each  column  is  called  a  pivot  if  it  is  the  only  nonzero 
element  in  the  column. 

II.  Main  Results 

Definition  1  A  rate-k/n  convolutional  encoder  is  locally  in¬ 
vertible  if  Gjm<  has  a  pivot  in  every  nonzero  row ,  i.e.,  if 
all  the  nonzero  rows  in  G[mt  m+v]  are  linearly  independent . 

The  time  domain  test  for  a  rate-k/n  encoder  being  basic  is 
the  existence  of  k  pivots  in  the  last  k  rows  of  GL,  ,  i. 

Theorem  1  A  basic  rate-k/n  convolutional  encoder  is 
minimal-basic  if  and  only  if  it  is  locally  invertible . 

lH.  Koorapaty  is  with  Dept,  of  Elec.  &  Comp.  Eng.,  NCSU, 
Raleigh,  NC  27695-7911,  USA. 


A  fast  time-domain  algorithm  for  testing  whether  a  rate-k/n 
convolutional  encoder  is  minimal-basic  is  as  follows:  l)  Com¬ 
pute  G^mj  m+v],  and  2)  Inspect  Gjmtrn+Vj  to  ascertain  that  all 
the  nonzero  rows  have  a  pivot  and  that  all  the  last  k  rows  have 
pivots.  In  [7],  it  is  shown  that  the  test  for  minimal-basicity  of 
an  encoder  requires  a  smaller  section  of  G,  corresponding  to 
only  u  output  branches. 

Upper  and  lower  bounds  on  the  number  of  consecutive  all- 
zero  outputs  of  a  rate-k/n  minimal-basic  encoder  starting  in 
a  nonzero  state,  given  in  [1],  may  also  be  derived  using  the 
property  of  local  invertibility.  If  a  basic  encoder  is  locally 
invertible  at  length  (/3  +  1),  the  rank  of  G[m>  is  equal  to 
the  number  of  nonzero  rows  in  it.  For  such  an  encoder,  an 
all-zero  encoded  subsequence  V[t>  t+p]  cannot  be  produced  by 
a  nonzero  information  subsequence  since  there  is 

a  one-to-one  mapping  between  the  information  and  encoded 
subsequences  at  length  (/3  +  1)  [7].  Therefore,  the  required 
bounds  on  the  number  of  consecutive  all-zero  outputs  coincide 
with  the  bounds  on  the  parameter  j3  at  which  an  encoder  may 
achieve  local  invertibility.  These  bounds  are  derived  in  [7]  and 
are  shown  to  coincide  with  Forney’s  original  bounds. 

A  rate-k/n  encoder  is  minimal  if  and  only  if  it  has  a  poly¬ 
nomial  inverse  in  D  and  a  polynomial  inverse  in  D™1  [2].  The 
time-domain  test  for  minimality  is  given  by  the  following  the¬ 
orem  [7]: 

Theorem  2  A  rate-k/n  convolutional  encoder  is  minimal  if 
and  only  if  Gj^  m+I/j  contains  k  pivots  in  the  band  of  k  rows 
operating  on  the  information  block  uf . 
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Abstract  —  It  is  well  known  that  convolutional  codes 
are  discrete  time  linear  systems  defined  over  a  finite 
field.  In  this  short  correspondence  we  report  about 
some  important  first  order  representations  recently 
considered  in  the  systems  literature.  Using  this  de¬ 
scription  we  derive  a  new  factorization  of  the  well 
known  “sliding  block”  parity  check  matrix  often  en¬ 
countered  in  the  coding  literature. 

L  Generalized  First  Order  Systems 
Let  IF q  ~  IF  be  the  Galois  field  with  q  elements  and  consider 
a  n  x  k  matrix  G(V)  defined  over  the  polynomial  ring  IF[£>]. 
G(V)  generates  a  [n,  k]  convolutional  code  through: 

C  :=  {w(V)  |  w(V)  =  G(V)£(V)}  (1) 

Note  that  we  follow  the  convention  in  systems  theory  by  writ¬ 
ing  all  vectors  as  column  vectors.  From  the  point  of  view 
of  systems  theory  (1)  defines  an  MA- representation,  the  k- 
vectors  £(D)  describe  the  set  of  latent  variables  and  the  set 
of  n- vectors  w(D)  describe  the  so  called  behavior,  i.e.  the 
code  words.  In  the  sequel  we  will  assume  that  G(V)  is  in  col¬ 
umn  proper  form  having  column  indices  /ii , . .  - ,  Pk  and  overall 
constraint  length  c  :=  pi.  Then  one  has  the  following 

equivalent  first  order  description. 

Theorem  1  There  exist  (c  -f  n  -  k)  x  c  matrices  K ,  L  and  a 
(c  +  n  -k)  xn  matrix  M  (all  defined  over  IFj  such  that  (1)  is 
equivalently  described  through 

C  :=  {w(V)  |  3x(V)  :  Kxt+ 1  +  Lxt  =  Mwt}.  (2) 

where  w(V)  =  ^  WtV1  E  Wn[V],  and  x(T>)  —  ^ 

WC[D].  In  addition  the  following  minimality  conditions  are 
satisfied: 

Ml:  K  has  full  column  rank. 

M2:  The  full  size  minors  of  [VK  +  L  M]  are  coprime. 

Remark  2  If  G{V)  is  in  addition  a  minimal  encoder,  then 
one  can  show  (compare  with  [1,  3])  that  the  c  x  c  full  size 
minors  of  the  pencil  VI\  +  L  are  coprime. 

II.  Duality 

Let  H{V)  be  a  (n-k)xn  full  rank  polynomial  matrix  having 
the  property  that  H (V)G(V)  —  0.  H(V)  describes  a  parity 
check  matrix  for  the  convolutional  code  C  introduced  in  (1) 
through: 

C  =  {w(V)  |  H{V)w(V)  =  0}.  (3) 

Theorem  3  There  exist  cx(c-\~k)  matrices  P ,  Q  and  anx(c+ 
k)  matrix  R  (all  defined  overW)  such  that  (3)  is  equivalently 
described  through 

{w{V)  |  3 z{V) :  wt  =  Rzt,  Pzt+i  =  Qzt}.  (4) 
In  addition  the  following  minimality  conditions  are  satisfied: 


Ml  ’:  P  has  full  row  rank. 

M2K.  The  full  size  minors  of  [VP^Q]  are  coprime. 

The  minimality  conditions  (MU)  and  (M25)  guarantee  that 
that  after  a  possible  permutation  of  the  external  variables  the 
matrices  P,Q,R  in  (4)  have  an  equivalent  description  of  the 
form: 

p  =  (l  0)  Q  =  (A  B)  R=(C0  DJ^)  (5) 

which  in  turn  is  equivalent  to  the  representation: 

xt+ 1  =  Axt  +  But,  yt  =  Cxt  +  Dut ,  (6) 

a  well  known  description  [2]. 

III.  Factorization  of  the  Sliding  Block  Matrix 

One  way  of  studying  convolutional  codes  is  usually  through 
the  use  of  the  so  called  ‘sliding  block  matrix’  induced  through 
the  parity  check  matrix  H(V).  In  the  sequel  we  provide  a 
factorization  of  this  matrix.  Let  K,L ,  M  be  defined  as  in  (2) 
and  define: 


'  K 

0 

...  0  ' 

'  M 

0 

...  0  " 

L 

K 

0 

M 

0 

L 

0 

T  = 

*•.  0 

K 

_  0 

0  M  . 

_  0 

0  L  . 

p) 

where  we  assume  that  both  5  and  T  consists  of  s  + 1  vertical 
blocks.  Let  U  be  a  matrix  with  the  property,  that 

Ker  U  =  Im  5. 

Theorem  4  iv(V)  :=  Y2t= o  Wt'^t  £  G  if  and  only  if 
UT  (wl,  w[, . . . ,  =  0, 

i.e.  UT  represents  a  factorization  of  the  sliding  block  matrix 
of  order  s. 
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I.  On  the  Definition  of  Convolutional  Codes 
Let  1  <  k  <  n,  F  be  a  finite  field,  F(D)  be  the  field  of 
rational  function  in  D  over  F,  and  F((D))  be  the  field  of 
formal  Laurent  series  in  D  over  F. 

Definition  1  (Massey  [1])  A  rate  k/n  convolutional  code  over 
F  is  a  ^-dimensional  subspace  of  the  rc-dimensional  (row)  vec¬ 
tor  space  F(D)n. 

Definition  2  (Forney  [2])  A  rate  k/n  convolutional  encoder 
over  F  is  a  &-input  n-output  constant  linear  causal  finite-state 
sequential  circuit.  And  a  rate  k/n  convolutional  code  C  over 
F  is  the  set  of  outputs  of  the  sequential  circuit. 

An  equivalent  formulation  of  Definition  2  (cf.  [3])  is 

Definition  2'  A  rate  k/n  convolutional  code  over  F  is  a  k- 
dimensional  subspace  of  the  n-dimensional  (row)  vector  space 
F((D))n  with  a  basis  consisting  of  n- tuples  of  polynomials  (or 
rational  functions). 

Clearly,  a  convolutional  code  in  the  sense  of  Definition  1  is 
a  subcode  of  a  convolutional  code  in  the  sense  of  Definition  2. 

Definition  3  (Dholakia  [4])  A  rate  k/n  convolutional  code 
over  F  is  a  /^-dimensional  subspace  of  the  n-dimensional  (row) 
vector  space  F((D))n. 

Clearly,  a  convolutional  code  in  the  sense  of  Definition  2' 
is  a  convolutional  code  in  the  sense  of  Definition  3.  However, 
we  have 

Proposition  1  There  exist  convolutional  codes  in  the  sense 
of  Definition  3  which  is  not  a  convolutional  code  in  the  sense 
of  Definition  2* . 

Proof:  Let  f(D)  be  a  formal  Laurent  series  in  D  which  is  not 
ultimately  periodic,  and  let  C  be  the  1-dimensional  subspace 

F((D))(l,f(D)) 

of  F((D))2.  Then  C  is  a  rate  1/2  convolutional  code  in  the 
sense  of  Definition  3  but  not  a  convolutional  code  in  the  sense 
of  Definition  2'.  □ 

Corollary  2  There  exist  convolutional  codes  in  the  sense  of 
Definition  3  which  can  not  be  realized  by  constant  linear  causal 
finite-state  sequential  circuit. 

II.  On  the  Dual  Code  of  a  Convolutional  Code 

Let  C  be  a  rate  k/n  convolutional  code  in  the  sense  of 
Definition  2' .  Define 

Cx  =  {«(!>)  e  F((D))n  |  v(D)c(D)r  =  0  Vc(£>)  €  C}. 

Proposition  3  Let  C  be  a  rate  k/n  convolutional  code  in  the 
sense  of  Definition  2* .  Then  CL  is  a  rate  ( n  —  k)/n  convolu¬ 
tional  code  in  the  sense  of  Definition  2’ . 

Using  invariant  factor  theorem  Forney  [2]  actually  proved  this 
proposition.  A  simple  elementary  proof  can  be  given  by  using 
Definition  1. 


III.  A  Minimality  Criterion  of  Encoding 
Matrices 

Let  G(D)  be  a  k  x  n  matrix  of  full  rank  with  entries  in 
F(D).  If  G(D)  is  realizable  and  delayfree  then  G(D)  is  called 
an  encoding  matrix  of  the  convolutional  code 

C  -  {v(D)  =  u(D)G(D)  |  u(D)  e  F((D))k) 

in  the  sense  of  Definition  2.  For  any  u(D)  £  F((D))k,  write 

u(D)  =  u_mD~rn  +  . . .  +  +  u#  +  u^D  +  u2D2  +  . . . , 

where  u{  €  Fk .  Define 

n{D)P  =  u_mD-m  +  ...  +  u_1D-\ 
u(D)Q  =  Uq  -f-  u j (D)  +  u^D2  T  . . . . 

The  set 

{u{D)PG{D)Q  |  u(D)  €  F((D))k} 

is  called  the  abstract  state  space  of  C  relative  to  the  encoding 
matrix  G(D).  If  its  cardinal  attains  the  minimum,  G(D)  is 
called  a  minimal  encoding  matrix  (cf.  [3]). 

Proposition  4  Let  G(D)  be  an  encoding  matrix.  Then  the 
following  statements  are  equivalent. 

(a)  G(D)  is  a  minimal  encoding  matrix. 

(d)  G(D)  has  a  polynomial  right  inverse  in  D  and  a 
polynomial  right  inverse  in  D~l . 

(e)  For  any  v(D)  =  u(D)G(D)  where  u(D)  G  F((D))k, 
if  v(D)  is  polynomial  in  D  then  so  is  u(D),  and  if 
v(D)  is  polynomial  in  D-1  then  so  is  u(D). 

The  equivalence  of  (a)  and  (d)  was  proved  in  [3].  Now  the 
equivalence  of  (d)  and  (e)  is  proved. 
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Abstract  —  Some  improved  version  of  the  union 
bounds,  expressed  in  the  same  terms  is  proposed. 

Transmission  of  binary  information  sequence  over  the  BSC 
with  crossover  probability  0<p<l/2is  considered.  It  is  as¬ 
sumed  that  a  noncatastrophical  time-invariant  convolutional 
encoder  and  Viterbi  decoder  are  used.  There  are  two  types  of 
performance  characteristics  that  are  usually  used  to  describe 
the  probabilistic  behavior  of  such  communication  system.  The 
first  type  characteristics  describe  the  stationary  behavior  of 
the  system  (e.g.  bit-error  probability,  averaged  decoding  de¬ 
lay,  etc.).  Usually  they  are  of  the  main  interest.  The  second 
type  characteristics  describe  the  behavior  of  the  system  at 
initial  moment  (e.g.  first-error  event  probability).  The  most 
commonly  used  ’’union  bounds”  to  upperbound  any  of  men¬ 
tioned  above  characteristics  do  not  take  into  account  some 
essential  difference  between  these  two  types  of  characteristics 
[1,2].  We  show  that  standard  ’’union  bounds”  for  stationary 
characteristics  can  be  considerably  improved  preserving  the 
same  form  and  terms. 

Denote  by  Pe  the  conditional  probability  that  at  any  given 
moment  the  edge  will  be  decoded  incorrectly  provided  that 
all  preceding  semi-infinite  information  sequence  was  decoded 
correctly. 

Theorem  1.  Conditional  first-error  event  probability  Pe 
satisfies  the  inequality 


a(w ,  l)Aw 

1  Atjj 


(1  -  P*)\ 


(1) 


where  a(w,l)  is  the  number  of  codepaths  of  weight  w  and  length 
l,  and  Aw  is  the  error  probability  when  testing  two  codewords 
of  weights  0  and  w  [1  ]. 

Remarks.  1)  Inequality  (1)  differs  from  a  ’’standard” 
union  bound  by  presence  of  factors  (1  —  Pe)1  in  the  right-hand 
side  of  (1).  As  a  result  it  gives  a  nontrivial  (i.e.  Pe  <  1)  upper 
bound  for  any  crossover  probability  p  <  1/2  and  this  bound  is 
always  tighter  then  the  ’’standard”  union  bound  (which  works 
only  for  some  small  p  ).  2)  Inequality  (1)  can  be  expressed  in 
terms  of  the  generating  function  T(D,  L)  with  L  =  1  —  Pe  .  3) 
Inequality  (1)  remains  also  valid  for  some  other  channels  (e.g. 
gaussian) . 

In  the  case  of  bit-error  probability  Pb  we  limit  ourselves 
here  only  to  the  folowing  result. 

Theorem  2.  There  exists  some  critical  value  po  such  that 
if  p  <po,  then  Pb  <  B ,  where  B  is  defined  from  the  following 
system  of  equations 


where  a(w,l,i)  is  the  number  of  codepaths  of  weight  w ,  length 
l  and  information  weight  i. 

Remarks.  1)  It  is  possible  to  evaluate  the  critical  value 
po  .  2)  If  p  >  po  ,  then  the  equation  (2)  will  be  replaced  by 
some  similar  equation.  3)  Both  theorems  are  based  on  some 
reccurent  relations  and  on  a  certain  inequality  from  [3]. 
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I.  Introduction 

Neural  networks  have  been  used  to  tackle  what  might  be 
termed  ‘empirical  regression’  problems.  Given  independent 
samples  of  input/output  pairs  {xi^yi),  we  wish  to  estimate 
f(x)  =  E[Y  |  X  =  x\.  The  approach  taken  is  to  choose  an  ap¬ 
proximating  class  of  networks  A f  =  {q{x  ;«)}  and  within 

that  class,  by  an  often  complex  procedure,  choose  an  approxi¬ 
mating  network  ry(-;  w*).  The  distance  (in  mean  squared  error) 
of  this  network  from  /  can  be  separated  into  two  terms:  one 
for  approximation  or  bias  —  choosing  A/*  large  enough  so  that 
some  T}(-;  w°),  say,  models  /  well  —  and  one  for  estimation  or 
variance  —  how  well  the  chosen  rj(-]w*)  performs  relative  to 
r\ (*;  w°).  We  address  the  latter  term. 

II.  Problem  Statement 

Networks  are  parameterized  by  weight  vectors  w  E  W  C  Rd 
and  take  inputs  x  £  Rk .  In  classification,  network  output  is 
restricted  to  {0, 1}  while  for  regression  it  may  be  any  real  num¬ 
ber.  The  complexity  of  the  architecture  J\f  may  be  measured 
by  the  number  of  weights  d  or  by  its  Vapnik-Chervonenkis 
(VC)  dimension  v.  Performance  of  a  network  is  measured  by 
S{w)  =  E  (t]{x\  w)  —  y)2  and  the  optimal  net  w°  minimizes 
this.  In  practice,  the  law  P  is  unknown  so  weights  w*  £  W 
are  chosen  using  the  training  set  T  =  {(a:;, yi )]T=1  by  mini- 
mizing  ur(w)  =  i  ]T"=1  -  yi)2 . 

The  question  of  determining  the  relation  between  architec¬ 
ture  complexity,  estimation  error,  and  training  set  size  comes 
down  to  finding  n  large  enough  so  that  for  a  given  d  (or  u), 
£(w*)  —  £{w°)  <  e  with  high  probability.  We  adopt  this  as 
a  definition  of  reliable  generalization.  We  can  avoid  dealing 
directly  with  the  stochastically  chosen  network  w*  by  noting 
the  triangle  equality  implies 

0  £  £(w*)  —  £(w°)  <  2  sup  | V'r{w)  —  £{w)\ 

Vapnik  [1]  shows  that  n  —  (9.2v/e2)  log(8/e)  is  sufficient  for 
reliable  generalization.  In  cases  where  ut{w*)  =  0,  this  can 
be  lowered  [2]  to  n  —  {b.Sv/e)  log(12/e),  but  both  are  orders 
of  magnitude  higher  than  practice  indicates. 

III.  Approximations  via  Poisson  Clumping 

For  the  large  n  we  anticipate,  the  central  limit  theorem 
leads  us  to  replace  the  original  empirical  process  ur(w)  —  £(w) 
with  the  corresponding  zero-mean  Gaussian  process  Z(w ): 

P( II  Wr(w)  -  E(w)\ ||  >  €)  ~  P(||  \Z(w)\  ||  >  b) 


centered  at  w  is  chosen  independently  of  all  other  clumps.  The 
PCH  leads  to 

<*> 

where  <1  is  the  complementary  cdf  of  N( 0, 1)  and  a2(w)  is  the 
variance  of  Z(w).  Loosely,  the  overall  exceedance  probabil¬ 
ity  is  a  sum  (integral)  of  the  point  exceedance  probabilities, 
each  scaled  according  to  the  number  of  weights  that  have  ex¬ 
ceedances  with  it. 

This  provides  a  means  to  get  accurate  approximations  for 
the  exceedance  probabilities  when  the  level  b  is  large.  For  ex¬ 
ample,  if  network  activation  functions  are  twice  differentiable 
and  the  variance  has  a  unique  maximum  a2  at  w  E  W,  then 
n  =  da2K/e2  samples  are  sufficient  for  reliable  generalization, 
where  K  is  determined  by  P  and  A/\  Explicit  results  for  the 
problems  of  recognizing  rectangles  and  halfspaces  in  Rk  can 
also  be  obtained.  These  are  again  of  order  d/e2  but  with  con¬ 
stants  far  lower  than  previous  upper  bounds. 


IV.  Lower  Bounds 

These  PCH-based  estimates  are  of  theoretical  interest,  but 
in  practice  evaluation  of  the  constants  is  not  possible  due  to 
ignorance  of  P.  Now  consider  the  following  related  tool  for 
obtaining  rigorous  lower  bounds  to  exceedance  probabilities  of 
Z(w ),  where  for  simplicity  we  normalize  Z(w)  by  its  standard 
deviation  a  =  cr(w). 


P{\\Z{w)I<t{w)\\  >  6) 


/ 

Jw 


m 


w  E\Pb 

>  m  [ 


l\Z(w)/a  >  6]_1 

1 


dw 


E[Dh\Z(w)/a>b] 


dw 


where  Db  is  the  volume  of  {ru  :  Z(w)/cr(w)  >  6}.  Simple 
computations  link  this  to  the  correlation  p  =  p(w,wl)  via 


E[Db\Z(w)/a  >  6]  ~  f  <&((b/a)  Q  dw'  (2) 

Jw 


with  C  =  C(w,w')  =  ((1  -  p)/(  1  +  p))1/2. 

This  link  provides  the  basis  for  estimating  the  exceedance 
probability  empirically,  without  knowledge  of  P.  Using  the 
training  set,  compute  {yi  —  y{xi\w))2  at  w  and  w 1  for  all  n 
points.  This  yields  an  estimate  of  p  and  in  turn  an  estimate  of 
£  which  can  be  used  to  compute  the  integral  (2).  Simulations 
for  the  examples  of  recognizing  rectangles  and  halfspaces  show 
that  reasonable  estimates  of  sample  size  can  be  obtained  in  the 
absence  of  analytical  information  about  P  and  A/*. 


where  we  have  set  b  =  €y/n  and  used  the  notation  ||  *  ||  for 
supremum  over  weights. 

The  Poisson  clumping  heuristic  (PCH)  [3]  is  a  recently  in¬ 
troduced  tool  for  finding  such  exceedance  probabilities.  The 
PCH  tells  us  that  the  region  of  weight  space  where  Z(w)  ex¬ 
ceeds  level  b  is  a  group  of  clumps.  The  clump  centers  fall 
according  to  a  Poisson  process  and  the  size  Cb(w)  of  a  clump 
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I.  Introduction 

We  study  learning  in  a  general  class  of  machines  which 
return  a  (variable)  linear  form  of  a  (fixed)  set  of  nonlinear 
transformations  of  points  in  an  input  space.  A  fixed  machine 
in  this  class  accepts  inputs  X  from  an  arbitrary  input  space 
and  produces  scalar  outputs 

d 

y=  ^4n(x)w!  +  £  =  -iMX)'w*  +  £.  (i) 

i=  1 

Here,  w*  =  (w*  ,  .  .  .  ,  ) '  is  a  fixed  vector  of  real  weights 

representing  the  target  concept  to  be  learned,  for  each  i, 
i^i(X)  is  a  fixed  real  function  of  the  inputs,  with  ifi(X)  = 
(\^i(X),...,iJ>d(X))/  the  corresponding  vector  of  func¬ 
tions,  and  £,  is  a  random  noise  term. 

We  suppose  that  the  learner  receives  an  i.i.d.,  random  sam¬ 
ple  of  examples  (Xi  ,  Yi  ),  ...  ,  (Xn.Yn)  generated  accord¬ 
ing  to  the  joint  distribution  on  input-output  pairs  (X,Y) 
induced  through  the  medium  of  the  (unknown)  relation  (1) 
and  a  fixed  (unknown)  distribution  on  input-noise  pairs 
(X,£).  The  goal  of  the  learner  is  to  infer  a  hypothesis 
w  —  ( w  i  ,  .  .  .  ,  w  d ) /  with  small  (mean-square)  generalisa¬ 
tion  error  E(Y  —  ffRXj'w)2  on  future  random  examples 
(X,  Y)  generated  independently  of  the  training  sample  from 
the  same  underlying  distribution.  Here  E  denotes  expecta¬ 
tion  with  respect  to  the  underlying  probability  distribution 
generating  the  examples.  Note  that,  as  expected, 

w*  =  argminE(Y  —  ip  (X)  'w) 2 . 

II.  Results 

We  develop  a  rigourous  characterisation  of  the  time- 
dynamics  of  generalisation  in  this  class  of  machines  when 
a  finite  sample  of  examples  is  available  and  training  is  car¬ 
ried  out  by  minimisation  of  the  empirical  (or  training)  error 
En.(Y  —  tp(X)'w)2  via  gradient  descent,  where  En  denotes 
expectation  with  respect  to  the  empirical  distribution  which 
puts  equal  mass  A  on  each  of  the  n  random  examples  which 
constitutes  the  sample.  More  specifically,  given  the  sample, 
the  batch-mode  gradient  descent  algorithm  provides  an  itera¬ 
tive  refinement  {w(t),t>0}ofa  hypothesis  weight  vector 
w(t)  representing  the  true  concept  w*.  The  sequence  of 
weight  vector  updates  is  specified  recursively  according  to  the 
usual  gradient  formulation: 

w(0)  is  an  arbitrary  initial  hypothesis  in  Rd; 
w(t)  =  w(t  —  1 )  -  yeVEn(w(t  —  1 ))  (t>1). 

In  the  recursion,  the  integer  parameter  t  denotes  the  update 
epoch  and  the  positive  parameter  e  controls  the  rate  of  learn¬ 
ing. 

1  Department  of  Electrical  Engineering,  University  of  Pennsyl¬ 
vania,  Philadelphia,  PA  19104.  The  work  of  the  first  two  authors 
was  supported  by  the  Air  Force  Office  of  Scientific  Research  under 
grant  F49620-93-1-0120. 

2Siemens  Corporate  Research,  Princeton,  NJ  08540. 


The  empirical  minimum  mean-square  estimate, 

vv  =  arg  min  En  ( Y  —  ip  ( X )  'w) 2 , 

which  corresponds  to  the  estimate  obtained  in  the  limit  of 
training  over  an  infinity  of  time  steps,  is  unbiased  and  consis¬ 
tent.  Should  we  then  carry  training  out  to  its  limit?  Surpris¬ 
ingly,  perhaps,  the  answer  is  “No.”  Indeed,  we  show  analyti¬ 
cally  that  as  training  progresses  in  time  three  distinct  phases 
in  generalisation  dynamics  are  evidenced.  In  the  first  phase, 
the  generalisation  error  E^Y  —  \p(X)/w)  (where  E  denotes 
expectation  with  respect  to  the  unknown  underlying  distribu¬ 
tion  generating  the  examples)  decreases  monotonically  (keep¬ 
ing  pace  with  a  corresponding  decrease  in  the  training  error); 
this  phase  is  completed  in  O(logrt)  time  steps  where  n  is  the 
number  of  examples.  The  behaviour  grows  more  interesting  in 
the  second  phase  where  the  generalisation  error  exhibits  com¬ 
plex  dynamics  and  an  optimal  stopping  time  topt  is  evidenced 
at  which  the  smallest  generalisation  error  obtains;  the  second 
phase  is  also  ephemeral  and  takes  only  O(logn)  time  steps. 
Finally,  in  the  third  phase,  the  generalisation  error  increases 
monotonically  to  a  limiting  value  of  error;  this  phase  takes  the 
rest  of  time.  Thus,  best  generalisation  occurs  not  at  the  limit 
of  training  when  the  global  minimum  of  the  training  error  is 
achieved,  but  rather  after  a  finite  number  of  steps  of  the  order 
of  the  logarithm  of  the  sample  size. 

One  of  the  key  concepts  that  emerges  from  our  analysis  is 
the  formal  notion  of  the  effective  size  of  a  network.  This  is 
a  time-varying,  algorithm-dependent  quantity  which,  in  the 
limit  of  training  over  an  infinity  of  time  steps,  coincides  with 
the  VC-dimension  of  the  machine.  As  is  well  known,  a  salient 
characteristic  of  neural  networks  is  that  they  often  involve  a 
very  large  number  of  adjustable  parameters  as  compared  to 
traditional  statistical  models  (such  as  classification  and  re¬ 
gression  models)  with  a  resulting  large  VC-dimension.  For  a 
given  (small)  sample  of  fixed  size,  how  then  does  one  explain 
empirical  claims  reporting  valid  generalisation?  Our  results 
shed  light  on  this  puzzle:  stopping  learning  at  the  optimal 
time  results  in  a  network  with  small  complexity  in  the  sense 
that  its  effective  size  at  that  time  is  typically  substantially 
smaller  than  its  effective  size  in  the  limit  of  training  (the  VC- 
dimension).  More  generally,  we  show  that  the  generalisation 
error  of  the  machine  during  the  training  process  is  determined 
at  each  training  epoch  by  the  effective  size  of  the  machine  at 
that  epoch  rather  than  its  VC-dimension.  Our  analysis  pro¬ 
vides  a  formal  framework  within  which  optimal  stopping  can 
be  viewed  as  dynamically  fitting  machine  complexity  to  the 
sample  wherein  best  generalisation  obtains  when  effective  ma¬ 
chine  size  best  fits  the  sample  size.  Thus  we  rescue  the  pre¬ 
vailing  intuition  (Occam’s  razor)  from  its  impending  dilemma. 

The  study  of  generalisation  dynamics  leads  naturally  to  a 
new  network  size  selection  criterion  which  can  be  viewed  as 
a  generalisation  of  Akaike’s  information  criterion  to  cover  not 
just  network  complexity  (the  effective  machine  size  in  the  limit 
of  training)  but  the  time  evolution  of  the  learning  process  as 
well. 
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Abstract  —  We  consider  the  problem  of  inferring 
a  finite  binary  sequence  w*  €  {— 1,1}"'  from  a  ran¬ 
dom  sequence  of  half-space  data  {u(t}  e  {—  1 1  }n  * 
(w*,u^)  >  0,  t  >  1  }.  In  this  context,  we  show  that 
a  previously  proposed  randomised  on-line  learning  al¬ 
gorithm  dubbed  Directed  Drift  [1]  has  minimal  space 
complexity  but  an  expected  mistake  bound  exponen¬ 
tial  in  ru  We  show  that  batch  incarnations  of  the 
algorithm  allow  of  massive  improvements  in  running 
time.  In  particular,  using  a  batch  of  Fjcru  log  ru  ex¬ 
amples  at  each  update  epoch  reduces  the  expected 
mistake  bound  to  0(ru)  in  a  single  bit  update  mode, 
while  using  a  batch  of  7tnlogn  examples  at  each  up¬ 
date  epoch  in  a  multiple  bit  update  mode  lead  to  con¬ 
vergence  to  w*  with  a  constant  (independent  of  n) 
expected  mistake  bound. 

I.  Introduction 

Write  B  =  {—  1 ,  1  }n  for  simplicity  and  let  =  {  —  1 ,  1  }n 
denote  the  vertices  of  the  binary  ru-cube.  Let  w*  e  Bn 
be  some  fixed  (but  unknown)  vertex.  Suppose  we  are  pro¬ 
vided  with  a  random  labelled  sequence  of  positive  examples 
{  u(t) ,  t  >  1  }  of  w*  obtained  by  independent  sampling  from 
the  uniform  distribution  on  the  positive  half-space  of  vertices 

®£(w*)  =  {ue®n  :  <w,u)  >  0}. 

Our  goal  is  to  infer  the  finite  binary  sequence  w*  in  an  efficient 
(on-line)  manner  from  the  sample  {u^t} }. 

II.  Directed  Drift 

Directed  Drift[l]  is  a  randomised,  on-line  learning  algo¬ 
rithm  with  minimal  space  complexity. 

Algorithm  D  {Directed  Drift).  Given  a  confidence  param¬ 
eter  6  >  0  and  a  sample  of  positive  examples  {  u ( 1  ^ ,  t  >  1  } 
generated  by  independent  sampling  from  the  uniform  distri¬ 
bution  on  B+  (w*),  the  algorithm  generates  a  hypothesis  w 
which,  with  confidence  in  excess  of  1  —  6,  coincides  with  the 
concept  w*. 

Dl.  [Initialise.]  Set  epoch  t  t—  1 ,  confidence  counter  T  <—  0, 
and  select  an  arbitrary  initial  hypothesis  w  £  B11". 

D2.  [Is  the  hypothesis  consistent  on  the  example?]  Set  Y  <— 
(w,u(t)). 

D3.  [If  it  ain’t  broke,  don’t  fix  it.]  If  Y  >  0,  increment  the 
confidence  counter  T<—  T+1;  if  T  >  log6_1 , 

output  the  hypothesis  w  and  terminate  the  algorithm; 
else  go  to  step  D5. 

D4.  [Update  hypothesis.]  Else  (if  Y  <  0)  set  T  <—  0,  J  <— 
{  j  :  Wj  u- }  and  pick  a  random  index  j  from  the 

uniform  distribution  on  J.  Set  Wj  < - Wj  and  leave 

the  other  components  of  w  unchanged. 

1This  work  was  supported  by  the  Air  Force  Office  of  Scientific 
Research  under  grants  F49620-93-1-0120  and  F49620-92-J-0344. 


D5.  [Increment  time  and  iterate.]  Set  tf-  t+1  and  go  back 
to  step  D2. 

By  a  consideration  of  the  equilibrium  probability  distri¬ 
bution  of  the  states  of  the  finite  Markov  chain  which  rep¬ 
resents  the  system  we  show  that  the  algorithm  has  mini¬ 
mal  space  complexity  2ru  and  exponential  time  complexity 
n(e°-139n).2 

Massive  improvements  in  running  time  result  if  the  algo¬ 
rithm  is  modified  to  run  in  batch  mode.  In  a  single  bit  update 
batch  mode,  Step  D4  is  replaced  by 

D4'  [Update  hypothesis.]  Else  (if  Y  <  0)  set  T  <—  0 
and  call  an  additional  m  —  1  examples  u(t  +  1  \  ...  , 
u(t+m— i )  p)efine  tfie  indicator  functions 

T(s)  =  fl  ifwk  ^u[s), 
k  [0  if  wk  =  u.[s), 

and  select  the  index  )  garnering  the  most  votes:  )  <— 

argmaxu  Xlsit1*-1  Set  w,  < - Wj  and  leave  the 

other  components  of  w  unchanged.  Set  t  t+m-1. 

In  a  multiple  bit  update  batch  mode,  Step  D4  is  replaced  by 

D4"  [Update  hypothesis.]  Else  (if  Y  <  0)  set  T  0 
and  call  an  additional  m  —  1  examples  u/t+1  \  ...  , 
u(  t  +  m-i ) '  £>efine  the  indicator  functions 

t(s)  __  if  wu  /  \ 

k  (0  if  >vk  =  uj^s). 

Tally  the  votes  bk  —  I^.s)  and  order  the 

indices  such  that  bj,  >  bj2  >  >  bj^.  Set 

Wj  i - Wj  if  j  €  {)i  ,  .  .  .  ,  j  t ( i  —  v )  /  2j }  and  leave  the 

other  components  of  w  unchanged.  Set  t  <—  1 4-  m  —  1 . 

Relatively  small  batch  sizes  m  are  needed.  We  show 
that  in  a  single  bit  update  batch  mode,  a  batch  size  of 
m  —  ~rcn  log  ru  reduces  the  time  complexity  of  the  algo¬ 
rithm  to  O(ru)  while  in  a  multiple  bit  update  batch  mode,  a 
batch  size  of  m  =  7rru  log  n  reduces  the  time  complexity  of 
the  algorithm  to  0(1),  independent  of  ru. 
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Abstract  —  This  paper  describes  a  novel  approach 
for  pattern  recognition  based  on  the  matching  be¬ 
tween  coded  patterns  to  feature  vectors.  Our  intent  is 
to  integrate  three  individual  steps  (data  acquisition, 
feature  extraction  and  decision  making)  of  a  pattern 
recognition  problem  and  to  solve  them  simultaneously 
as  a  unique  problem.  The  proposed  pattern  recogni¬ 
tion  method  was  explicitly  illustrated  by  a  numerical 
character  recognition  problem. 

Coded  patterns  matched  to  feature  vectors  in  a  pat¬ 
tern  recognition  system  is  conceptually  analogous  with 
the  matching  between  a  group  and  a  set  of  signal  in  a 
digital  communication  system.  In  order  to  get  a  set  of 
signals  matched  to  a  group  it  is  necessary  to  set  up  a  cor¬ 
respondence  between  the  linearity  and  the  distance  mea¬ 
sure.  Such  arrangement  allows  us  to  replace  the  Ham¬ 
ming  distance  measure  by  the  Euclidean  distance  mea¬ 
sure  [1].  Now  we  define  formally  the  matching  of  a  set  of 
signals  to  a  group  (Definition  1)  and  the  transitive  group 
(Definition  2). 

Definition  1  [1]:  A  signal  set  S  is  matched  to  a  group 
G  if  there  exists  a  mapping  h  from  G  onto  S  such  that,  for 
any  gi  and  g2  in  G,  d(h(gi),  h(g2))  =  d(h{g^[l  *g2),  h(e)), 
where  e  denotes  unit  of  G.  A  mapping  h  satisfying  this 
condition  will  be  called  a  matched  mapping.  Moreover, 
if  h  is  one-to-one  then  its  inverse,  /z"1,  will  be  called  a 
matched  labeling. 

Definition  2  [1]:  Let  S  be  a  set  of  signals  and  /  : 
S  —y  S  be  an  isometry.  If  A  is  a  group  of  transformations 
of  S  and  s  is  an  element  of  S,  then  orbit  of  s  under  A 
is  the  set  A(s)  =  {/(s)  :  /  >  A}.  The  transformation 
group  A  is  called  transitive  of  A(s)  =  S  for  some  s  G  S 
(therefore,  for  all  s  G  S). 

Next  we  consider  only  the  case  of  set  of  signals  with 
order  2n.  The  first  Sy low's  Theorem  which  guarantees 
the  existence  of  a  group  of  order  2n  is  as  follows. 

First  Sylow’s  Theorem  [2]:  Let  G  be  a  finite  group 
of  order  pnm,  n  >  1  and  p  does  not  divide  m.  Then,  (1) 
G  has  a  subgroup  of  order  pl  for  any  integer  i,  1  <  i  <  n\ 
(2)  Each  p*-order  subgroup  H  of  G  is  a  normal  group  of 
order  pt+l  for  1  <  i  <  n. 

The  existence  of  a  subgroup  of  order  2n  allows  us  to 
form  a  group  of  2n  orthogonal  matrices  which  is  capable 
of  generating  a  transitive  group.  It  is  worth  mentioning 
that  the  product  between  each  element  of  the  group  of 
matrices  and  a  signal  vector  (feature  vector)  results  in  a 
signal  vector  (feature  vector)  also. 

Numerical  pattern  recognition:  Each  input  pat¬ 
tern  (numerical  character)  is  an  4-by-8  pixel  rectangle. 


Another  way  to  look  at  each  character  pattern  is  that  it 
consists  of  eight  “2-by-2  primitive  patterns” .  There  are 
in  total  eight  distinct  primitive  patterns  as  shown  below: 

00  00  01  01  11  11  10  10 

00  11  01  10  11  00  10  01 

Therefore,  the  procedure  for  numerical  character  recogni¬ 
tion  proposed  here  consists  of  primitive  pattern  recogni¬ 
tion.  Next,  we  map  the  primitive  patterns  into  a  binary 
linear  code  Z  =  (000,001,011,010,111,110,100,101). 
which,  in  its  turn,  is  matched  to  a  set  of  feature  vec¬ 
tors,  S.  The  set  S  represents  the  eight  vertices  of  a  cube, 
namely  si  =  (1,1,1),  s2  =  (1,1,-1),  s3  = 

S4  =  (1,  —1, -1),  S5  =  (-1,1,1),  S6  =  (— 1,  1, -1) 

s7  =  (-1,-1, 1)  and  s8  =  (-1,  -1,  -1). 

From  those  eight  signal  vectors  of  S,  we  can  easily  find 
a  group  of  orthogonal  matrices  which  is  a  transitive 
group.  Signal  sj  and  orthogonal  matrix  Bi  are  related  by 
the  transformation  T#.  •  Sj  — +  BiSj.  Note  that  trans¬ 
forms  a  signal  vector  into  another  signal  vector.  Solving 
the  set  of  transformation  Tst  results  in  eight  orthogonal 
diagonal  matrices.  The  set  of  matrices  {Bi  }  forms  a 
non-cyclic  commutative  group  under  the  matrix  opera¬ 
tions.  Moreover,  these  matrices  define  a  transitive  group 
of  orthogonal  transformations. 

It  can  be  shown  that  there  is  one-to-one  correspon¬ 
dence  between  elements  of  sets  Z  and  S .  We  represent 
this  one-to-one  correspondence  as  z*  + — *  Bi)  which  im¬ 
plies  the  existence  of  an  isomorphism  between  groups 
(Z,  0)  and  (5,.),  denoted  by  ip  :  (Z,  0)  — ►  {B, .).  It 
can  be  easily  shown  that  <p  is  bijective,  and  for  any  z i, 
z2  G  Z%  <p(zi  0  z2)  =  (p(zi).<p(z2)  Therefore,  Z  and  B  are 
isomorphic. 

Noting  that  the  matching  between  Z  and  S  is  a  con¬ 
sequence  of  the  isomorphism  between  Z  and  B.  Define 
mapping  h  :  Z  — ►  S  as  zz-  — >  h(zi)  =  T^^sj)  —  Bi&j- 
The  Euclidean  distance  between  any  two  elements  of  Z 
satisfies  the  relation  d(/i(zz-),  h(zj))  =  d(h(z r1  *  z;-),  h(e)) 

It  turns  out  that  the  primitive  pattern  classification 
consists  of  identifying  of  feature  vectors  in  the  feature 
space  (signal  space).  Such  a  geometric  property  of  feature 
vectors  makes  the  decision  procedure  simple  and  straight¬ 
forward  because  the  decision  regions  are  symmetric. 
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Abstract  -  New  training  algorithms  for  fully  recurrent 
neural  networks  are  presented.  They  are  based  on 
Hessian  matrices  estimates.  Simulation  results  show  that 
the  algorithms  yields  satisfactory  results. 

I.  Introduction 

Recurrent  neural  networks,  having  every  unit  connected  to 
every  other  unit,  are  the  most  general  case  of  neural 
networks  and  are  highly  non-linear  dynamical  systems 
exhibiting  a  rich  and  dynamical  behavior.  The  architecture 
is  inherently  dynamic  and  usually  one-layered,  since  its 
complex  dynamics  confer  it  powerful  representation 
capabilities.  Recurrent  networks  with  the  same  structure  can 
display  different  dynamic  behavior,  as  a  result  of  the  use  of 
diverse  learning  algorithms.  The  network  is  determined 
when  its  structure  and  learning  rule  are  given,  as  the 
network  is  truly  a  composition  of  two  dynamical  systems: 
transmission  and  adjusting  systems.  The  total  input-output 
behavior  is  therefore  a  result  of  the  interaction  of  both. 
Hence,  the  importance  of  learning  rules  in  recurrent  neural 
networks  is  readily  understood  .  Learning  algorithms  used 
for  recurrent  networks  are  usually  based  on  the  computation 
of  the  gradient  of  a  cost  function  with  respect  to  the 
weights  of  the  network.  There  are  few  learning  algorithms 
applicable  to  general  recurrent  neural  networks 
architectures  and  the  most  representative  is  the  so  called 
RTRL  (Real  Time  Recurrent  Learning)  algorithm  [1] 
(Williams  and  Zipser).  This  algorithm  is  truly  on  line  and  is 
a  gradient  descent  type  although  more  computationally 
expensive  than  other  recurrent  neural  network  algorithms 
(e.g.  the  backpropagation  through  time).  However  this 
undesired  feature  can  be  compensated  by  the  fact  that 
general  fully  recurrent  architectures  usually  use  far  fewer 
neurons  than  backpropagation  structures. 

This  paper  proposes  two  new  algorithms  for  fully 
recurrent  neural  networks  using  Hessian  information 
(second  derivatives  of  the  cost  function  with  respect  to  the 
parameters).  The  algorithms  use  estimates  to  the  Hessian 
matrix  with  different  degrees  of  computational 
complexities.  Both  algorithms  use  a  matrix  that  is  computed 
recursively  on  line  with  elements  based  on  the  sensitivity 


parameter  as  defined  by  Williams  and  Zipser  [1].  The 
second  algorithm  uses  a  less  computing  demand  estimate 
based  on  a  diagonal  matrix  approximation  for  the  Hessian 
matrix  inspired  on  the  Hessian  matrix  of  the  first  algorithm. 
The  idea  of  using  a  diagonal  matrix  approximation  for  the 
Hessian  matrix  is  not  new  and  was  pursued  by  Becker  and 
Le  Cun  in  a  backpropagation  feedforward  architecture  [2]. 
These  methods  are  known  as  pseudo-Newton  algorithms 
and  have  the  advantage  of  faster  learning  capabilities.  They 
re-scale  the  learning  rate  of  each  weight  dynamically  to 
match  the  curvature  of  the  cost  function  with  respect  to  that 
weight. 

II.  Conclusions 

Experiments  were  done  to  compare  the  proposed 
learning  algorithms  with  existing  ones  (e.g.  RTRL  and 
pseudo-Newton)  in  the  presence  of  noise.  The  new 
algorithms  had  shorter  learning  periods,  and  the  first 
proposed  one  was  the  faster,  at  a  cost  of  a  higher 
computational  complexity.  The  first  proposed  algorithm  can 
still  be  an  attractive  alternative  because  its  high  computing 
demands  can  be  compensated  by  the  use  of  very  small  fully 
connected  neural  networks.  There  are  some  engineering 
applications  that  may  use  as  few  as  two  or  three  fully 
connected  neurons.  The  proposed  algorithm  is  being  used  to 
neural  channel  equalizers  by  the  author.  The  availability  of 
prior  information  could  reduce  the  computing  demands  of 
on-line  learning  methods  for  recurrent  neural  networks. 
Alternatives  in  this  direction  are  being  studied  by  the  author 
to  continue  or  improve  the  algorithms. 
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Artificial  neural  networks  (ANN’s)  have  been  successfully 
applied  in  the  fields  of  signal  processing  and  pattern  recogni¬ 
tion.  In  recent  years,  efforts  have  been  made  to  design  ANN 
decoders  for  error  control  codes.  Although  the  general  decod¬ 
ing  problem  can  be  viewed  as  a  form  of  pattern  recognition 
(PR),  the  information  capacity  in  an  error  control  code  is  far 
more  extensive  than  that  contained  in  most  PR  problems. 
Because  of  this,  neural  net  training,  a  popular  design  tool  for 
ANN,  has  not  fared  well  in  ANN  decoders.  So  far,  trained 
ANN  decoders  are  limited  to  very  small  codes  like  the  (7,4) 
Hamming  code  and  convolutional  codes  with  no  more  than 
2  memory  elements.  Meanwhile,  algebraic  structures  of  the 
error  control  codes  are  not  efficiently  used  in  trained  ANN  de¬ 
coders,  resulting  in  inferior  performance  relative  to  that  of  the 
conventional  decoders.  For  these  reasons,  the  design  of  ANN 
decoders  has  become  a  process  of  “neuralizing”  the  existing 
digital  decoding  algorithms  which  have  themselves  been  de¬ 
rived  by  fully  exploiting  the  algebraic  properties  of  the  codes. 
The  decoding  process  can  be  maximally  parallelized  by  neu¬ 
ral  nets,  which  greatly  increases  the  decoder  throughput.  Such 
ANN  decoders  have  been  successfully  designed  for  many  im¬ 
portant  block  codes,  such  as  Hamming  codes,  the  (24,12)  Go- 
lay  code  and  the  (32,16)  QR  code  [1]. 

This  paper  presents  an  ANN  Viterbi  decoder  for  convolu¬ 
tional  codes.  In  the  past,  Viterbi  decoders  have  always  been 
implemented  using  digital  circuits.  The  speed  of  these  digital 
decoders  is  directly  related  to  the  amount  of  parallelism  in  the 
design.  As  the  constraint  length  of  the  code  increases,  paral¬ 
lelism  becomes  problematic  due  to  the  complexity  of  the  de¬ 
coder.  In  this  work  it  is  shown  that  the  register  exchange  type 
[2]  of  VA  can  be  completely  represented  by  an  ANN  structure. 
However,  for  large  decoding  depth  F,  the  required  dynamic 
range  goes  far  beyond  what  an  analog  neuron  can  provide. 
Since  the  register  exchange  operation  is  digital  in  essence,  it 
is  natural  to  adopt  a  hybrid  design,  which  is  shown  in  Figure 
1  for  a  standard  rate-1/2  code  with  2  memory  elements. 

The  analog  part  of  the  design  implements  the  input  cor¬ 
relation  and  path  selection,  as  well  as  a  scaling  algorithm  to 
keep  each  neuron  holding  the  partial  metric  from  saturating. 
The  inputs  to  the  decoder  are  r0  and  n  from  the  binary  sig¬ 
nalling  AWGN  channel.  All  connection  gains  are  +1  unless 
marked  otherwise.  The  synchronization  circuit  is  not  shown 
in  the  figure  to  preserve  clarity,  The  structure  in  Figure  1  can 
be  easily  extended  to  rate-fc/n  convolutional  codes  with  M 
memory  elements. 

The  complexity  of  a  locally  connected  neural  network  is 
characterized  by  the  number  of  neurons,  N .  In  general  N  is 
found  to  be 

N  =  2M(2k+2  -  2)  +  2n  +  1 

which  gives  N  =  389  for  a  rate-1 /2  code  with  M  —  6.  The  neu¬ 
rons  can  be  realized  using  operational  amplifiers  (Op-Amps). 
If  each  Op-Amp  contains  20  transistors,  the  network  will  have 
less  than  8,000  transistors.  On  the  other  hand,  a  fully  digital 
implementation  for  the  same  code  requires  50,000  transistors 
just  for  ACS  operations  [3].  Some  other  advantages  of  the 


Output  data  T -bit  shift  registers  with 


Fig.  1:  The  ANN  Viterbi  decoder 


N  Viterbi  decoder  are: 

•  The  operations  of  the  ANN  decoder  are  fully  parallel. 

•  All  connections  have  unit  gain,  which  eliminates  weight 
considerations  in  VLSI  implementation. 

•  The  network  is  only  locally  connected. 

•  The  characteristics  of  neurons  are  very  simple  to  imple¬ 
ment. 

•  The  modularity  brought  by  the  hybrid  design  allows  fur¬ 
ther  improvements  by  using  more  sophisticated  memory 
management  techniques. 

U.S.  and  foreign  patents  are  currently  pending. 
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Abstract  -  This  paper  proposes  a  new  robust  hybrid  isolated 
word  speech  recognition  system  which  is  based  on  the 
improved  quantization  accuracy  of  FVQ,  the  strength  of 
HMM  in  modelling  stochastic  sequences,  and  the  non-linear 
classification  capability  of  MLP  neural  networks.  Thus  the 
proposed  FVQ/HMM/MLP  approach  combines  effectively  the 
relative  contributions  of  codebook  -  dependent  Fuzzy 
distortion  measures  with  model  -  dependent  maximum 
likelihood  probability  information.  Computer  simulation 
results  clearly  indicate  the  superiority  in  recognition  accuracy 
performance  of  the  FVQ/HMM/MLP  approach  when 
compared  to  that  obtained  from  FVQ/HMM  or  FVQ/MLP 
schemes. 

I.  INTRODUCTION 

The  system  employs  N  FVQ  codebooks  and  N  HMM  models. 
Given  an  input  word,  each  FVQ  codebook  produces  effectively  an 
associated  distortion  measure  d(  O,  Wj  ).  hi  addition,  an  FVQ 
output  vector  is  interpreted  as  a  probability  mass  vector  which  is 
accepted  by  the  associated  HMM  process  to  yield  a  maximum 
likelihood  probability  P(  O/Wj  ).  The  above  measures  can  be  used 
directly  as  inputs  to  an  MLP  classifier  or  can  be  combined  to  form 
a  hybrid  measure  which  is  then  presented  to  the  MLP  network,  hi 
our  noisy  speech  recognition  study  the  systems  under  examination 
are  trained  on  clean  speech.  Recognition  performance  is  then 
measured  with  the  input  signal  corrupted  by  vehicle  or  white 
acoustic  noise  at  different  Signal  to  Noise  Ratio  (SNR)  values. 

n.  SYSTEM  DESCRIPTION 

The  FVQ/HMM/MLP  algorithm  employs  N  VQ  codebooks  and  N 
HMMs.  Each  input  set  of  LSP  coefficients  is  then  Fuzzy  Vector 
Quantised  by  C  -  entries  codebooks  CBj  j  =  1,  2,  ...,  N.  Tims  an 

input  word  Wj  represented  by  a  series  }  of  Tj  LSP 

vectors,  is  vector  quantised  in  "parallel"  by  N  codebooks  and  a 
Fuzzy  Distortion  Measure  FDj[l]  is  obtained  from  each  VQ 
process  applied  to  the  input  word.  At  the  same  time,  the  N  parallel 
codebooks  yield  N  observation  sequences  which  drive  N 
corresponding  HMM  processes,  HMMj  j  =  1,  2,  ...,  N.  Thus  a 
maximum  likelihood  probability  P(  O/Wj  )  is  obtained  from  each 
HMM  process  in  response  to  an  input  word.  The  FDj  and  P(  O/Wj 
)  measures  can  be  combined  to  a  simple  measure  [1]  and  then 
presented  to  the  MLP  network  whose  output  OUT(  j  )  j  =  1,2,  ..., 

N  assumes  values  in  the  range  0  <OUT{j)  <1.  The  system 
selects  the  unknown  input  word  Wj  to  be  the  jth  vocabulary  word 
if  OUT(  j  )  =  max[  OUT(  j  )  ],  j  =  1,2,  ...,  N.  Hie  three  layer 
network  used  employs  P  hidden  nodes  and  N  input  nodes. 


Alternatively,  the  N  FDj  and  N  P(  O/Wj  )  measures  can  be  used  as 
inputs  to  an  MLP  classifier  having  2N  inputs  and  N  outputs.  Thus 
the  relative  contribution  of  the  above  two  similarity  measures, 
towards  a  correct  classification,  is  determined  by  the  neural 
network.  This  flexible  and  powerful  method,  for  "fusing"  the 
output  of  the  FVQ  and  HMM  parts  of  the  system,  has  been  used  in 
the  computer  simulation  experiments  discussed  in  the  next 
section. 

m.  EXPERIMENTS  AND  RESULTS 
The  performance  of  the  proposed  FVQ/HMM/MLP  scheme  has 
been  evaluated,  and  compared  with  that  obtained  from 
conventional  FVQ/MLP  [2]  and  FVQ/HMM  systems  [1].  Two  sets 
of  input  words  were  used  in  these  experiments:  set  one  is  based  on 
the  ten  English  digit  words,  zero  to  nine,  whereas  set  two  employs 
the  26  English  letters.  Figure  1  shows  the  performance  of  the 
FVQ/HMM/MLP,  FVQ/MLP  and  FVQ/HMM  systems  operating 
on  the  second  set  of  input  words,  for  different  input  SNR  values, 
when  speech  is  corrupted  by  car  noise. 


Fig.  1.  Recognition  performance  of  the  FVQ/HMM,  FVQ/MLP 
and  FVQ/HMM/MLP  for  the  car  noise  (N  =  26  ). 

IV.  CONCLUSIONS 

The  proposed  speech  recognition  system  provides  a  superior 
performance,  under  noisy  input  conditions,  when  compared  to 
conventional  schemes  [1],  [2].  Hie  system  achieves  a  recognition 
rate  of  98.33%  and  90%  at  30  dB  and  20  dB  SNR  values 
respectively,  when  operating  on  set  one  of  input  words. 
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Abstract  —  A  stochastic  model  is  established  for 
fully-connected  recurrent  neural  networks  with  sig¬ 
moid  units  based  on  Gibbs  distributions.  EM 
(Expectation-Maximization)  algorithm  with  a  mean 
field  approximation  is  then  applied  to  train  recurrent 
networks  through  hidden  state  estimation.  The  re¬ 
sulting  EM-based  algorithm,  which  reduces  training 
the  original  recurrent  network  to  training  a  set  of  in¬ 
dividual  feedforward  neurons,  simplifies  the  original 
training  process  and  reduces  the  training  time. 

I.  Introduction 

The  goal  of  this  work  is  in  two- fold.  First,  we  would  like  to 
develop  a  stochastic  model  to  train  recurrent  networks  with 
sigmoid  units.  Second,  through  the  model  developed,  we  will 
derive  a  novel  training  algorithm  for  recurrent  networks  which 
drastically  simplifies  the  original  training  process. 

II.  A  Stochastic  Model 

Consider  a  recurrent  network  with  d  inputs,  L  hidden  units 
and  one  linear  output  unit.  Let  x(n)  £  Rd ,  t(n  +  1)  £  R1  and 
2(71  +  1)  £  RL  be  an  input,  a  desired  output  of  a  recurrent  net¬ 
work  and  a  desired  output  of  hidden  units  (also  called  desired 
hidden  states)  at  n-th  (and  n  +  1-th)  epoch,  respectively.  Let 
{x},  {2}  and  {t}  denote  all  the  inputs,  desired  hidden  states 
and  outputs  up  to  epoch  N.  Let  0  contains  all  the  parame¬ 
ters  of  the  recurrent  network:  w^\  and  w^2\  the  weights 
between  inputs  and  hidden  units,  between  hidden  units,  and 
between  hidden  and  output  units,  respectively. 

A  stochastic  model  can  then  be  established  through  a  con¬ 
ditional  probability  model  based  on  the  Markov  property 
of  recurrent  networks  1  i.e.  P({z }  |  {<},  {^},  ^(1),  O)  — 
niiP(*(n  +  1)  I  •*(»),<(»  + 1),  *(«),©)  and  P({z},{<}  | 

{x},z(l),0)  =  nLi-PW"  +  1)><(n  +  !)  I  *(»)>*(»)>©)> 
where  2(1)  is  the  initial  desired  hidden  states.  Further¬ 
more,  Gibbs  distributions  can  be  used  which  eventually  lead 
to  the  following  probabilities  P(z(n  +  1)  |  2(71),  x(ra),  0)  = 
Aexp(—  |(2(7i  +  1)  —  z(n  +  1))TE-1  (2(71  +  1)  —  2(71  +  1)))  and 
P(z(n  +  1),  t(n  +  1)  |  2(71),  x(n),  0)  =  B  exp(— Ai£i(ti  +  1)  — 
A  2 £2 (ti  + 1)) ,  where  £1(7*  +  1)  =  ||  z(n  +  1)  —  h(n  +  1)  ||2  and 
£2(77  + 1)  =  (t(n  +  1)  —  2(71  +  l)Tti/2))2.  Ai ,  A2,  A  and  B  are 
constants.  2(71  +  1)  is  the  expectation  of  z(n  +  1)  at  71  +  1-th 
epoch  when  given  2(71), z(n)  and  0.  h(n  +  1)  is  the  actual 
hidden  output. 

III.  EM  Algorithm  For  Recurrent  Networks 

Once  the  stochastic  model  is  developed,  a  new  training  al¬ 
gorithm  is  derived  through  EM  algorithm[2].  The  essence  of 
EM  algorithm  is  that  certain  hidden  variables  (missing  data) 
can  be  introduced  to  simplify  a  maximum  likelihood  problem. 

1  This  property  comes  from  the  fact  that  the  outputs  of  a  recur¬ 
rent  network  and  its  hidden  units  at  current  epoch  only  depend  on 
the  actual  outputs  of  hidden  units  at  previous  epoch. 


For  our  case,  the  hidden  variables  are  desired  hidden  states 
z(n  +  l)’s  which  serve  as  missing  data,  whereas  the  incom¬ 
plete  data  consists  of  pairs  of  {x(ti),  t(n  +  l)}’s.  Using  similar 
derivations  as  in  [3],  we  can  obtain  the  expected  log  likelihood 
Q(Q  |  © ”)  =  /{2}  P({z]  |  {<},  {*},  ©p)  In  P({z}  |  {<},  {x},  ©), 
where  0  and  0P  are  the  new  parameters  and  the  parameters 
at  the  previous  step,  respectively. 

Since  Q(0  |  0P)  is  difficult  to  evaluate  directly,  a  mean- 
field  approximation^]  is  used  to  estimate  Q(0  |  ©p),  which 
eventually  leads  to  an  EM-based  algorithm  for  recurrent  net¬ 
works. 

E-step:  Evaluate  the  expected  desired  hidden  states  re¬ 
cursively  through  E z3{n  +  1)  =  h3(n  +  1)  +  ae(7i  +  1),  where 
£7(71  +  1)  =  g{x{n)T +  Ez(n)Tw^)  for  1  <  j  <  L.  g(u)  is 
a  sigmoid  function.  e(n  +  l)  =  t(n  +  1)  —  h(n  +  1  )Tw^2\  and 
a  is  a  constant. 

M-step:  Using  the  expected  desired  hidden  states  ob¬ 
tained  at  the  E-step  as  targets  for  recurrent  hidden  neurons 
to  find  new  parameters  through  two  steps. 

(a)  Find  ’s  and  u^’s  through  minimizing  the  dif¬ 

ference  between  expected  desired  and  “actual”  hidden  states 
h(n)  :  ^2  II  Ez(n)  —  h(n)  ||2. 

n 

(b)  Find  through  minimizing  difference  between  de¬ 

sired  outputs  of  the  network  and  weighted  expected  desired 
hidden  targets:  z(n  +  l)Tu/2^  —  t(n  +  l))2. 

The  algorithm  will  iterate  between  the  E-  and  M-steps  until 
a  convergence  criterion  is  achieved. 

Notice  that  (a)  and  (6)  are  equivalent  to  training  individual 
feedforward  neurons  and  can  be  solved  using  a  fast  training 
algorithm  given  in  [l]. 

IV.  Simulation  Results 

Learning  a  teacher  recurrent  network  is  chosen  as  an  initial 
test  problem.  When  RTRL (back-propagation  algorithm  for 
recurrent  nets),  BPTT  (back- propagation  through  time)  and 
our  algorithm  were  required  to  achieve  a  similar  mean-square- 
error,  our  algorithm  can  be  at  least  10  times  faster. 
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Abstract  —  In  this  paper  we  provide  sufficient  con¬ 
ditions  for  convergence  of  a  general  class  of  alternat¬ 
ing  estimation-maximization  (EM)  type  continuous- 
parameter  estimation  algorithms  with  respect  to  a 
given  norm. 

I.  Introduction 

Let  0  =  [0j, . . .  ,  0P]T  be  a  real  parameter  residing  in  an 
open  subset  ©  of  the  ^-dimensional  space  IRP.  Given  a  general 
function  Q  :  0  x  0  — >  JR  and  an  initial  point  9°  £  0,  consider 
the  following  recursive  algorithm,  called  the  A-algorithm: 

A-algorithm:  0l+1  =  argmaxe€0Q(0, 9l).  (1) 

If  there  are  multiple  maxima,  then  9 t+1  can  be  taken  to  be 
any  one  of  them.  Let  9*  £  0  be  a  fixed  point  of  (1). 

The  A-algorithm  contains  a  large  number  of  popular  iter¬ 
ative  estimation  algorithms  such  as:  ML- EM  algorithms  (e.g. 
Dempster,  Laird,  and  Rubin  (1977),  the  penalized  EM  algo¬ 
rithm  (e.g.  Hebert  and  Leahy  (1989))  ,  and  EM-type  algo¬ 
rithms  implemented  with  E-step  or  M-step  approximations 
(e.g.,  Antoniadis  and  Hero  (1994),  Green  (1990)). 


The  following  convergence  theorem  establishes  that,  if  71+ 
is  not  empty,  the  region  in  Definition  1  is  a  region  of  monotone 
convergence  in  the  norm  ||  •  ||  for  an  algorithm  of  the  form  (1). 
It  can  be  shown  that  71+  is  non-empty  for  sufficiently  regular 
problems  (Hero  and  Fessler  (1995)). 

Theorem  1  Let  9*  £  0  be  a  fixed  point  of  the  A  algorithm 

(1),  where  9 t+1  =  argmaxeeQQ(9, 0l),  i  =  0,1, _  Assume: 

i)  for  all  9  £  0,  the  maximum  max©  Q(9 ,  9)  is  achieved  on  the 
interior  of  the  set  0;  ii)  Q(0,0)  is  twice  continuously  differ¬ 
entiable  in  9  £  0  and  9  £  0,  and  Hi)  the  A-algorithm  (1)  is 
initialized  at  a  point  9°  £  71+  for  a  norm  ||  •  ||. 

1.  The  iterates  9\  i  —  0, 1, . . .  all  lie  in  71+ , 

2.  the  successive  differences  A 0*  =  9l  —  9*  of  the  A  algo¬ 
rithm  obey  the  recursion: 

A 0t+1  =  [Ai(0i+\Bi)]-1A2(ei+\0i)  •  A 0\  (4) 

3.  the  norm  ||  A^*  ||  converges  monotonically  to  zero  with  at 
least  linear  rate ,  and 

4 .  A 9l  asymptotically  converges  to  zero  with  root  conver¬ 
gence  factor 


II.  Convergence  Theorem 

A  region  of  monotone  convergence  relative  to  the  vector 
norm  ||  ■  ||  of  the  A-algorithm  (1)  is  defined  as  any  open  ball 
B(9*,8)  =  {9  :  ||0  —  <  <$}  centered  at  9  =  9*  with  ra¬ 

dius  8  >  0  such  that  if  the  initial  point  9°  is  in  this  region 
then  ||0l  —  0*||,  i  =  1,2,...,  converges  monotonically  to  zero. 
Note  that  as  defined,  the  shape  in  IRP  of  the  region  of  mono¬ 
tone  convergence  depends  on  the  norm  used.  However  in  IRP 
monotone  convergence  in  a  given  norm  implies  convergence, 
however  possibly  non-monotone,  in  any  other  norm. 

Define  the  p  xp  matrices  obtained  by  averaging  V20  Q(u,u) 

and  Vl:l<5(n,¥)  over  the  line  segments  u  £  99*  and  u  £  99*: 

A1(9,9)  =  -  f  V20Q(t9  +  (l-t)9*,t9  +  (l-t)9*)dt 

Jo 

A2(0,0)  =  f  VllQ(t9  +  (l-t)9*,t9  +  (l~t)9*)dt. 

Jo 

Also,  define  the  following  set: 

<S(0)  =  {0£0  :  Q(0,9)  >  Q(0,0)}. 

By  construction  of  the  A-algorithm  (1),  we  have  0t+1  £  5(0*). 
Definition  1  For  a  given  vector  norm  ||-||  and  induced  matrix 
norm  |  •  |  define  71+  C  0  as  the  largest  open  ball  B(9*,8)  = 
{0  :  ||0  —  0*||  <  £}  such  that  for  each  9  £  B(9*,  8): 

Ai(0,0)>O,  for  all  0  £  5(0)  (2) 

and  for  some  0  <  a  <  1 

|  [Ai(M)]-1  •  A2(M)  <  a,  for  all  8  €  S(6).  (3) 

1  This  work  was  partially  supported  by  grants:  NSF-BCS- 
9024370,  DOE-FG02-87ER60561,  NIH-CA-60711 


p([-V2O<?(fr,0*)]  1  VnQ(0‘,0*)) 


III.  Tomography  Application 

In  emission  computed  tomography  the  objective  is  to  es¬ 
timate  the  object  intensity  vector  0  =  [0i, . . . ,  9P]T  ,  06  >  0, 
from  Poisson  distributed  projection  data  Y  =  [Yi, . . . ,  Ym]T. 
The  Shepp-Vardi  implementation  of  the  ML-EM  algorithm  for 
estimating  the  intensity  0  h  he  form: 


6 


where  Pd\i,  is  a  full  rank  matrix  of  transition  probabilities 
from  emission  locations  to  projection  locations  and  Pb  = 
Pd\b ■ 

Using  Theorem  1  we  obtain 

Theorem  2  Assume  that  the  unpenalized  ECT  EM  algorithm 
specified  by  (5)  converges  to  the  strictly  positive  limit  9*. 
Then ,  for  some  sufficiently  large  positive  integer  M : 

||  In  (?'+1  —  In  0*j|  <  a ||  In  —  In  0*||,  i  >  M, 

where  a  =  p([B+  C\~lC),  B=B(6*),  C=  C{9*),  the  norm 
||  •  ||  is  defined  as: 

INI  2d=J2pbrbul  (e) 

6=1 

Lange  and  Carson  (1984)  showed  that  the  ECT  EM  al¬ 
gorithm  converges  to  the  maximum  likelihood  estimate.  As 
long  as  9*  is  strictly  positive,  Theorem  2  asserts  that  in  the 
final  iterations  of  the  algorithm  the  logarithmic  differences 
ln0*  —  ln0*  converge  monotonically  to  zero  relative  to  the 
norm  (6). 
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I.  Introduction 

Csiszar  [11  presented  an  axiomatic  derivation  of  the  use  of  the  I- 
divergence  as  a  discrepancy  measure  between  nonnegative  vec¬ 
tors.  Snyder,  Schulz,  and  O’Sullivan  [2]  then  proposed  the  use 
of  the  I-divergence  as  an  optimality  criterion  in  deblurring  prob¬ 
lems,  and  introduced  the  deterministic  version  of  the  EM  algo¬ 
rithm.  Byrne  [3]  used  a  similar  scenario  to  [2],  but  also  looked 
at  reverse  entropy  measures  and  included  maximum  entropy 
penalties.  O’Sullivan  [4]  introduced  roughness  penalties  for  use 
in  stochastic  problems  where  the  use  of  Markov  random  fields 
may  not  arise  naturally;  these  penalties  are  used  here  for  the 
deterministic  problem.  Vardi  [5,6]  has  related  papers. 

Let  0  g  Rp  be  a  vector  of  parameters  to  be  estimated. 
The  available  data  are  ym  =  Hmn*n*  1  where 

y  G  Rf ,  Hmn  >  0,  and  xeRf  depends  on  0.  The  manner  in 
which  x  depends  on  0  yields  slightly  different  algorithms.  The 
matrix  H  is  assumed  to  have  at  least  one  positive  entry  in  each 
column.  We  show  that  x  may  be  considered  to  be  the  complete 
data  for  0.  The  incomplete  data  I-divergence  is  shown  to  equal 
an  averaged  complete  data  I-divergence  plus  an  additional  term. 
This  decomposition  is  a  generalization  of  the  decomposition  for 
the  stochastic  data  problem  and  it  simplifies  steps  used  to  prove 
convergence  in  [2].  The  deterministic  EM  algorithm  then  con¬ 
sists  of  minimizing  the  averaged  complete  data  I-divergence;  the 
averaging  step  corresponds  to  the  E-step,  the  minimization  is  the 
M-step.  Finally,  a  maximum  entropy  penalty  and  a  roughness 
penalty  are  incorporated  into  the  problem. 

II.  Deterministic  EM  Algorithm  Derivation 

The  problem  is  to  find  the  Q  that  minimizes  7(ylHx(<9)),  where 
/(yl7)  =  Lm=iUml°g—  -yM  +  7j>  and  log  means  natural 

0  m 

log.  Let  X  e  R+  and  define  a  function  of  x  by 

g„„(x) = - where  Y = Ais°’ denote  by  h  ■ x 
ri 

the  N  x  1  vector  whose  nth  entry  is  xn  £m=i  Hmn. 

Theorem  1: 

/(ylHx)  =  X  [4(s™.(x)|h  •  x>  -  4(S™.(x)ISm*(x))]  +  F(y-X)> 

m=  1  * 

where  F  does  not  depend  on  x,  and  x  is  arbitrary. 

The  notation  /„(*!•)  indicates  that  the  I-divergence  is  com¬ 
puted  over  the  n  index  only.  The  vector  y  may  be  referred  to  as 
the  incomplete  data  and  /( ylHx)  is  the  incomplete  data  I- 
divergence.  The  theorem  states  the  the  incomplete  data  I- 
divergence  equals  the  sum  of  three  terms.  The  first  is  an  aver¬ 
aged  I-divergence  involving  the  vector  h  x  and  is  called  the 
complete  data  I-divergence;  x  is  the  complete  data  vector.  The 
second  term  is  an  I-divergence  that  is  used  to  guarantee  mono¬ 
tonicity  of  the  sequence  of  likelihood  values.  The  third  term  is 
an  extra  term  that  does  not  depend  on  the  complete  data. 

The  deterministic  EM  algorithm  then  has  the  following 
steps  at  iteration  k  +  1  given  an  estimate  0k  and  the  correspond¬ 


ing  xk  =  x(0k)  from  iteration  k: 

My 

E-Step:  Compute  Q(x\xk)  =  £  T7  4(g™(x  )>h  *  x) 

m=  1  Y 

M-Step:  Find  0k+l  =  argmin  Q(x(0)\xk). 

The  objective  function  is  nonincreasing  since  (using  x  =  x*) 

M  v 

7(ylHx*+1)  -  7(ylHx*)  ^7  Un(8mntf)b  *  x*+1) 

m=l  1 

-h(gnm(*k) I*1  ■  X*)  “  h(gnm&k)\grm(Xk^))l 
The  sum  of  the  first  two  terms  in  the  bracket  is  nonpositive  by 
the  M-step,  and  the  last  term  is  nonpositive  because  the  I- 
divergence  is  nonnegative.  For  discussions  of  convergence  see 
[2,3,7].  If  6  =  x,  the  algorithm  derived  in  [2]  results, 


4+1  = 


ymHn 


.  H 


ri  ya 


Byrne  [3]  introduced  maximum  entropy  penalties  (7(xl£) 
or  7(£ lx))  into  the  deterministic  problem;  prior  information  is 
assembled  into  a  prior  guess  O’Sullivan  [4]  introduced  rough¬ 
ness  penalties  that  penalize  discrepancies  with  neighbors.  Let 
{S,-,  1  <  i  <  7}  be  a  set  of  iV  x  N  permutation  matrices.  Then  the 
roughness  penalties  are  of  the  form  7(xlS,x).  Furthermore,  a 
generalized  EM  algorithm  was  introduced  in  [4]  based  on  the 
resulting  neighborhood  structures.  The  minimum  penalized  I- 
divergence  problem  is  to  compute  the  vector  x  that  minimizes 
7(ylHx)  +  al(£\x)  +  t  7(xlS,x).  The  GEM  algorithm  from 
[4]  may  be  used,  replacing  the  complete  and  incomplete  data 
spaces  for  that  stochastic  problem  by  the  corresponding  spaces 
from  this  deterministic  problem,  to  obtain  a  sequence  of  iterates 
that  converges  to  the  optimum. 
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Abstract  —  We  propose  a  sequential  approach  for 
studying  the  Viterbi  algorithm  via  a  renewal  se¬ 
quence  of  the  most  informative  stopping  times  which 
allows  us  in  particular  to  obtain  new  asymptotic 
“single-letter”  decoding  conditions  of  equivalency  be¬ 
tween  the  Baum- Welch,  segmental  K-means  and  vec¬ 
tor  quantization  algorithms  of  the  hidden  Markov 
models  parameters  estimation. 

I.  Introduction 

Let  { ht }  be  a  finite  hidden  Markov  chain  (HMC)  indi¬ 
rectly  observed  through  a  process  {zt},  zt  E  RD,t  = 
0,  Given  a  sequence  of  observations  z™  and  a  set 

A  —  {^fc0,  aht_1hi ,  b(zt/ht)}  of  prior,  transition  and  obser¬ 
vation  probabilities,  respectively,  the  Viterbi  algorithm  (VA) 
allows  us  to  find  the  most  likely  state-sequence  (MLSS) 

h0  of^the  HMC  via  maximizing  the  next  additive  criterion 
dN(h0  )  =  max^iv  In  P{h^,z^}  by  a  dynamic  programming 

(DP)  method.  Then  the  MLSS  h ^  ]  or  the  optimal  seg¬ 
mentation  of  the  observations  z^~x  can  be  obtained  by  the 
backtracking  t  =  N  —  1, . . . ,  0:  ht  =  fct+i(/it+i),  where 

*  *  N—i 

hN  =  argmax^  d^(/i0  (^iv)),  (see,  [l]-[3]). 

The  direct  implementation  of  VA  requires  to  store  the  val¬ 
ues  of  ht  what  fills  up  a  table  K(m  x  N )  with  columns  of 
back  pointers  kt  :  Ht  — ►  Ht-ui  -  N,N  -  1,...  with 
Hn  =  H  =  {0,1,..., m  —  1}  but  if  for  a  some  moment 
E  H:  ks+i(Ht+i)  =  j  for  all  ht  E  &t  =  H,t  >  s,  then 
h,  =  j  is  called  a  special  column  (SC)  in  the  table  K  of 
optimal  VA  decisions  [2]  and  the  moments  of  the  SC s  appear¬ 
ing  are  the  most  informative  stopping  times  (MISTs) 
for  the  Viterbi  recognition  of  HMS  [3],  [4]  because  after  their 
appearing  further  observations  do  not  change  the  previous  de¬ 
cisions  of  the  VA. 


II.  Results 

We  establish  the  renewal  properties  of  the  MIST  sequence  and 
the  duality  between  the  Wald’s  sequential  analysis  and  the  VA 
which  allow  us  to  develop  a  sequential  version  of  the  segmental 
K-means  algorithm  for  reducing  the  biases  of  estimates  if  the 
set  of  parameters  A  is  unknown. 

Then  we  consider  the  asymptotic  conditions  of  equivalency 
between  the  Baum- Welch  (BW),  segmental  K-means  (SKM) 
algorithms  and  vector  quantization  (VQ)  approach  which  has 
important  applications  in  speech  recognition  (see,  [5],  [6]). 
When  the  set  of  parameters  {A}  is  unknown,  it  can  be  es¬ 
timated,  for  instance,  by  the  BW:  A*  =  argmax*  P\(Z),  or 
by  the  SKM:  A  =  argmaxA  max^pA^/i)  algorithms  what 
can  be  achieved  by  the  following  iterative  maximizations  re¬ 
spectively 

BW:  Xi+1  ~  argmaxA  YjhP*i(h/Z) 111 
SKM:  Ai+i  =  argmaxA  J2h^(h  ~  M  A*))  In  Pa  {Z,  h)r 


where  h  =  h^ ,  Z  =  z ^  and  £(*)  is  the  Kronecker  ^-function. 
Thus,  if  Px^h/Z)  S(h  —  h(Xi)),  Vi,  then  h(Xi)  is  a  dominant 
MLSS  for  Vt  and  BW  A  ~  SKM  A.  A  sufficient  condition  of 
the  existence  of  such  a  dominant  MLSS  h^T  = 
where  h*  =  argniin^  D-1  In b(zt/Qh),  has  been  given  in  [5]: 

-  lim  D~1]nb(zt/eht)  =  Hht,  (1) 

where  Hkt  is  a  constant  entropy  depending  on  ht  and  0ht  is  a 
set  of  parameters.  Furthermore,  the  probability  of  deviation 
from  the  MLSS  given  Z,  decays  uniformly  exponentially  and 
does  not  depend  on  the  length  N  of  the  sequence  h.  Then  from 
[3]  one  can  have 

Theorem  1  Given  (l),  the  dominant  MLSS  is  asymptotically 
single-letter  decoding ,  as  D  — >  oo. 

Thus,  in  the  limit,  the  MISTs  will  appear  at  each  step  what 
coincides  with  the  case  of  a  generalized  single-letter  decoding 
[2].  For  the  Gaussian  HMC  with  the  autoregression  covariance 
matrix  associated  with  state  ht  we  can,  by  using  the  renewed 
properties  of  the  MIST  sequence,  further  strengthen  the  result 
of  [6]  that  as  D  — >  oo,  the  SKMA  becomes  equivalent  to 
the  VQ  approach  which  in  this  case  minimizes  the  Itakura- 
Saito  distortion  measure. 


Theorem  2  If  pij  >  S  >  0,  for  VSCi  (as  N  — >  oo),  then 

=  -  £  [g(^)+-/s(z"’AhJ 

n6{T0)Ti}  TlGlTQ.n} 


where  H(zn)  is  the  empirical  entropy  of  zn,  dis  is  the  corre¬ 
sponding  discrete  Itakura-Saito  distortion  measure,  Tk  is  the 
kth  moment  of  the  SC  appearing,  and 


lim  lim 

D— >oo  N—+oo 


lnmax/t  Pc  ( Z ,  h) 

— m — 


-Ei[H(zn)  + 


dis{zn:  A hn) 
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Abstract  -  An  original  procedure  for  estimating  the  model 
parameters  of  a  noncausal  Gauss-Markov  Random  Field 
(GMRF)  from  noisy  observations  Is  proposed.  Starting  from 
a  suitable  ’local’  representation  of  the  field  and  taking  Into 
account  the  symmetry  property  of  the  so-called  ’potential 
fields’  [3]  describing  the  GMRF,  a  linear  equation  system 
relating  the  model  parameters  to  the  (generally,  non¬ 
stationary)  2D  autocorrelation  function  (acf)  of  the  observed 
field  is  derived.  Its  solution  for  a  known  (or  estimated)  acf 
directly  gives  the  parameter  estimates  of  the  GMRF.  The 
unkown  variance  of  the  eventually  present  observation  noise 
can  be  also  estimated  jointly  with  the  model  parameters. 

Summary 

A  discrete-index  2D  zero-mean  Gaussian  random  process  (x(s)  €  Rl, 

S  e  1}  defined  over  a  rectangular  lattice  I  and  constituting  a  noncausal  dm- 
order  homogeneous  GMRF  with  respect  to  (wrt)  an  assigned  support 
region’  (or  ’neighbourhood  system’  [3,4])  t|(d)  admits  the  'innovations 
representation’  [1,2] 

x(£)  =  2  Wt)  + 1)  +  u(s),  £  g  1° 01(d))-  0) 

re  n(d) 

In  (1)  the  set  rj(d)  is  assumed  symmetric  and  constituted  by  an  even 
number  of  points  2L(d)  [4];  I’ 01(d))  is  the  set  of  'internal  points'  of  I  wrt 
n(d);  {<(>(r)€  R1,  iGT|(d)}  are  the  so-called  ’field  potentials’,  related  as 
reported  in  [2]  to  the  acf  (Ru(r)}  of  the  2D  stationary  zero-mean  Gaussian 
’innovations  process’  (u(s)  g  R1,  £eI0(T|(d))},  with  variance  Ku. 

From  the  obvious  symmetry  property  Ru(r)  =  Ru(-l)  we  have:  4>(r)  = 
r  erj(d).  This  allows  to  partition  of  the  support  region  r|(d)  in  the 
sub-sets  r|+(d),  rj.(d)c  r|(d),  each  constituted  by  L(d)  sites  and  such  that 
if  iGru(d)  then  -HGTF(d)  for  every  iGT|(d).  In  this  way  (1)  can  be 
rewritten  as 

x(s)  =  £  ♦©  [  x(s  +  r)  +  x(&  -  r)  ]  +  u(£),  £  g  rcn(d))*  (2) 

re  r|_(d) 

The  representation  of  the  GMRF  in  (2)  is  then  completed  by 
specifying  the  associated  boundary  conditions  (b.c.),  i.e.  the  statistics  of 
the  random  vector  constituted  by  the  r.v.s  extracted  from  the  random  field 
{x(£)}  at  the  boundary  points  of  the  lattice  I. 

It  is  also  assumed  that  the  GMRF  is  corrupted  by  a  2D  stationary 
zero- mean  additive  white  noise  process  { w(s)  g  R1,  £  g  1}  independent 
from  (x(£)}  and  with  (unknown)  variance  ow2,  so  that  the  resulting 
observation  process  {y(£)  g  R1,  £  g  1}  is  defined  as:  y(£)  =  x(£)  +  w(£). 

An  original  procedure  for  estimating  the  model  parameters  of  a 
GMRF  of  an  arbitrary  order  can  be  obtained  from  the  ’local’ 
representation  in  (2).  In  fact,  from  the  model  in  (2)  the  following  set  of 
linear  algebraic  equations  can  be  built  up: 

Ry&  s+m)  =  X  [ry(s+i;  s+m)  +  Ry(s-r;  s+m)]  +  [uw2  +KU]  5(m), 
te,v(d)  £€  I’(ri(d)),  s+16  I,  (3) 

thus  relating  the  unknown  model  parameters  to  the  2D  acf  {Ry(sT)}  of 
the  process  { Yfe)};  the  acf  is  assumed  'a  priori’  known  or  estimated  from 
the  available  observations  (8(m)  in  (3)  is  the  Kronecker  delta). 

Writing  (3)  for  a  set  of  L(d)  sites  £  g  F(r|(d))  suitably  chosen  and  for 


m  *  {ti-(d)u {0} }  (so  that  5(m)=0),  a  matrix  Unear  algebraic  equation 
system  is  directly  derived,  and  from  its  solution  the  field  potentials  { <Hl) ) 
are  obtained.  Such  a  system  can  be  considered  as  the  extension  to  the  case 
of  2D  noncausal  GMRFs  of  the  so-called  ’high-order’  Yule-Walker 
equations  for  the  parameter  identification  of  ID  causal  AR  processes. 
From  the  field  potentials,  writing  (3)  for  £G  F(r|(d))  and  for  any  me  r|_(d) 
such  that  <f)(m)  *  0,  the  noise  variance  0W2  is  then  calculated;  finally,  the 
parameter  Ku  is  computed  from  (3)  written  for  ffl  =  Q- 

The  illustrated  parameter  estimation  procedure  is  fully  general:  in  fact 
it  is  vaUd  for  GMRFs  defined  on  both  finite  or  infinite  lattices  and  for  any 
kind  of  assumed  boundary  conditions,  periodic  or  non-periodic,  their 
influence  being  embedded  in  the  acf  of  the  field  itself.  Moreover,  it  can  be 
easily  particularized  to  the  case  when  the  noise  variance  Gw2  is  known,  or 
when  the  observation  noise  is  absent. 

Comparing  the  proposed  solution  to  alternative  methods  available  in 
the  Uterature,  some  improvements  can  be  outlined.  More  in  detail,  having 
exploited  the  symmetry  <Kr)  =  gives  an  algebraic  system  with  half 
size  with  respect  to  the  system  in  [1].  On  the  other  hand,  the  procedure  in 

[4]  is  based  on  an  iterative  search  algorithm,  thus  giving  a  computational 
complexity  proportional  to  the  size  of  the  field,  while  the  proposed 
solution  is  based  on  a  ’local’  description  of  the  GMRF  so  that  it  does  not 
involve  time-consuming  iterative  search  algorithms  and  its  complexity  is 
independent  from  the  the  size  of  the  field.  Finally,  in  the  proposed 
approach  the  variance  of  the  observation  noise  is  estimated  together  with 
the  model  parameters  while  the  procedures  in  [1]  and  [4]  requires  that  it  is 
known  (or  separately  estimated).  The  results  of  some  computer 
simulations  of  the  above  procedure  are  reported  in  Tab.I  and  in  [7]. 
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True 

Estimated 

True 

Estimated 

True 

Estimated 

*(-1.-1) 

5.0  -2 

4.27  -2 

1.2-1 

1.10-1 

5.0  -2 

5.75  -2 

*(-1,0) 

5.0  -2 

4.89  -2 

1.2-1 

1.17-1 

9.0  -2 

8.80  -2 

*(-l,+D 

5.0  -2 

6.92  -2 

1.2-1 

1.21  -1 

5.0  -2 

5.93  -2 

*(0,  +D 

5.0  -2 

4.80  -2 

1.2-1 

1.38  -1 

9.0  -2 

9.80  -2 

Ku 

10.0 

10.01 

10.0 

10.23 

10.0 

9.815 

Tab.I  -  True  and  estimated  parameter- values  for  three  cases  of  noisy-free 
second-order  (i.e.,  ow2=0,  d=2)  2D  GMRFs  with  pinned-to-zero 
boundary  conditions.  The  field  potentials  { <(>(i) }  and  Ku  are  calculated  as 
in  (3)  by  estimating  the  aefs  {Rx(s;D)  from  104  independent  realizations. 
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Abstract  —  In  this  paper  we  use  the  uniform 
Cramer-Rao  (CR)  lower  bound  [1]  to  generate  bias- 
variance  tradeoff  curves  which  separate  achievable 
from  unachievable  regions  in  the  estimator  bias  vari¬ 
ance  plane. 

I.  Introduction 

Let  0_=  [Oi , ...,  6n]T  £  0  be  a  vector  of  unknown  parameters 
which  parameterize  the  distribution  of  an  observed  random 
variable  Y.  Let  6\  =  #i(Y)  be  an  estimator  of  the  scalar  9\ 
and  define  the  estimator  bias  function  b\  =  bi(9_)  =  Ee[0\]  —  6\ 
and  the  variance  function  cr2  =  <r2(£)  =  E$[(0i  —  9\)2].  The 
goal  of  this  work  is  to  quantify  fundamental  tradeoffs  between 
the  bias  and  variance  functions  for  any  parametric  estimation 
problem.  When  considered  as  surfaces  over  the  parameter 
space  0,  the  bias  and  variance  provide  a  very  informative  de¬ 
scription  of  estimator  performance,  for  example  they  jointly 
specify  the  MSE. However,  since  comparison  of  performance 
surfaces  over  a  large  set  ©  is  usually  impractical,  the  bias  and 
variance  in  a  small  neighborhood  is  of  greater  interest.  In 
this  case,  the  bias  gradient  V eb\  is  more  useful  since  it  is  in¬ 
sensitive  to  constant  and  hence  removable  biases.  It  can  be 
shown  that  is  directly  related  to  the  width  of  the  point 

spread  function  for  penalized  maximum  likelihood  deconvolu¬ 
tion  problems  [2].  The  weighted  norm  of  the  bias  gradient  is 
indirectly  related  to  the  variation  of  the  bias  function  over  0 
by:  |A&i(£)|  <  ||Ve&i  \\n  +  o(det\D\),  where  || u\\2D  =  uT DT Du 
and  D  is  an  invertible  matrix  whose  determinant  is  propor¬ 
tional  to  the  volume  of  the  region. 

II.  The  Bias- Variance  Tradeoff  Curve 

The  tradeoff  curve  is  derived  from  a  generalization  of  the 
bound  on  estimator  variance  presented  in  [1].  Unlike  the 
bound  of  [l],  this  bound  applies  to  the  case  of  singular  Fisher 
information  matrices  (FIM),  an  important  case  arising  in  de- 
convolution  problems,  and  permits  use  of  any  weighted  h 
norm  of  the  bias  gradient. 

Theorem  1  For  a  fixed  scalar  8  6  [0, 1]  let  6\  be  an  estimator 
whose  bias  gradient  satisfies  the  norm  constraint  ||Ve&i||^,  = 
uT DT Du  <  82 ,  where  D  is  an  arbitrary  non-singular  ma¬ 
trix.  Define  the  oblique  projection  operator  (n  x  n  matrix) 
Vfy  =  Fy[FyDT  DFy]^  FyDT  D  which  maps  n-dimensional 
space  onto  the  column  space  of  the  FIM  Fy,  and  define  the 
n-element  unit  vector  e. l  =  [1,  0, ...,  0]T.  Then  the  variance  of 
6 1  satisfies : 

varefO i)  >  B(0,  8),  (1) 

where  if\\VFY  ejs  <  8  then  B(6_,  8)  =  0,  while  (f  ||d  > 

8  then: 

B(Qj  8)  =  e^FyCn 

1This  work  was  partially  supported  by  NSF  Grant  BCS-9024370 


In  (2)  A  >  0  is  determined  by  the  unique  non-negative  solution 
of  the  following  equation : 

g(\)  =  ef  [.Fy  (A  •  DT D  +  Fy)  “2  f+]  e,  =  S2 .  (3) 

By  calculating  the  family  of  points  {( B(0, ,  8 ),  8)  :  8  €  [0, 1]} 
we  sweep  out  a  curve  in  the  bias-variance  plane  which  lower 
bounds  any  estimator  plotted  in  the  plane.  Figure  1  illustrates 
this  curve  for  a  simple  one  dimensional  Gaussian  deconvolu¬ 
tion  problem  and  the  unweighted  h  norm  (D= identity)  [2]. 


Uniform  CR  bound 


Figure  1:  Bias- Variance  Plane  and  Lower  Bound. 

The  region  above  and  including  the  curve  is  the  so  called 
‘achievable5  region  where  all  the  realizable  pairs  of  estimator 
variance  and  bias-gradients  exist.  Note  that  if  an  estimator 
lies  on  the  curve  then  lower  variance  can  only  be  bought  at 
the  price  of  increased  bias  and  vice  versa.  For  this  example 
the  regularized  least  squares  estimator  attains  optimal  bias- 
variance  tradeoff,  i.e.  it  hits  the  lower  bound  for  all  values 
of  8  [2].  In  this  case  the  bias  gradient  norm  8  was  swept  out 
by  varying  the  smoothing  (regularization)  parameter  of  the 
estimator. 

In  general  to  place  an  estimator  somewhere  within  the 
achievable  region  of  Figure  1  requires  the  variance  and  length 
of  the  estimator  bias  gradient.  In  most  cases  the  variance  and 
the  bias-gradient  length  are  analytically  intractable  and  must 
be  empirically  estimated.  Since  the  sample  mean  estimate  of 
the  bias  gradient  norm  has  severe  positive  bias  some  form  of 
bias  correction  is  necessary.  We  have  developed  a  bootstrap 
estimator  and  a  (1  —  a)%  lower  confidence  bound  for  this  pur¬ 
pose. 
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Abstract  —  A  number  of  new  spherical  t-designs  in 
three  and  four  dimensions  are  described.  Evidence 
is  presented  to  suggest  that  in  three  dimensions  the 
resulting  catalog  gives  a  complete  list  of  all  designs  of 
strength  t  <  9. 


I.  Introduction 

A  set  of  N  points  p  =  {Pi, . . . ,  Pn}  on  the  unit  sphere  Q,d  = 
Sd~l  =  {x  =  (zi, . . .  ,zrf)  G  Rd  :  x  •  x  =  1}  forms  a  spherical 
t- design  if  the  identity 


(1) 


(where  p  is  uniform  measure  on  Cld  normalized  to  have  total 
measure  1)  holds  for  all  polynomials  /  of  degree  <  t  ([3];  [4]; 
[2,  §3.2]).  In  the  present  paper  we  are  concerned  only  with 
the  cases  d  =  3  and  4. 


II.  Spherical  ^-Designs  in  Three  Dimensions 

In  three  dimensions  it  is  trivial  that  1-designs  exist  if  and  only 
if  N  >  2,  and  Mimura  [7]  showed  that  2-designs  exist  if  and 
only  if  N  =  4,  >  6.  Bajnok  [1]  found  3-designs  for  N  =  6,  8, 

>  10  and  conjectured  that  they  do  not  exist  for  N  ~  7  and 
9.  In  [5]  we  showed  that  4-designs  exist  for  N  =  12,  14,  >  16, 
and  conjectured  that  no  others  exist.  Reznick  [8]  showed  that 
5-designs  exist  for  N  =  12,  16,  18,  20,  22,  24,  >  26.  We  have 
found  5-designs  with  N  =  23  and  25,  and,  our  search  having 
repeatedly  failed  in  the  remaining  cases,  conjecture  that  5- 
designs  do  not  exist  for  N  =  13-15,  17,  19  and  21. 

Let  t(N)  denote  the  largest  value  of  t  for  which  an  N-point 
3-dimensional  spherical  t-design  exists.  Since  a  i-design  is  also 
a  t'-design  for  all  t'  <  t,  an  N-point  spherical  t-design  exists 
if  and  only  if  r(N)  >  t.  Our  results  lead  us  to  believe  that 
the  following  are  the  values  of  r(l), . . . ,  r(50): 

0,1, 1,2, 1,3, 2, 3, 2, 3, 

3,  5,  3,  4,  3,  5,  4,  5, 4,  5, 

4, 5, 5, 7, 5, 6, 5, 6, 6, 7, 

6,  7, 6, 7, 6, 8, 7, 7, 7, 8, 

7, 8, 7, 8, 8, 8, 8, 9, 8, 9 

This  is  part  of  a  much  larger  table  that  will  appear  in  [6]. 

The  results  of  this  table  then  suggest  that,  in  three  dimen¬ 
sions,  spherical  6-designs  with  N  points  exist  for  N  =  24,  26, 

>  28;  7-designs  for  N  =  24,  30,  32,  34,  >  36;  8-designs  for 
N  =  36,  40,  42,  >  44;  9-designs  for  N  —  48,  50,  52,  >  54; 
10-designs  for  N  =  60,  62,  >  64;  11-designs  for  N  =  70,  72, 

>  74;  and  12-designs  for  N  =  84,  >  86.  The  existence  of  some 
of  these  designs  is  established  analytically,  while  others  are 
given  by  very  accurate  numerical  coordinates. 

The  24-point  7-design  was  first  found  by  McLaren  in  1963, 
and  —  although  not  identified  as  such  by  McLaren  —  consists 
of  the  vertices  of  an  “improved”  snub  cube,  obtained  from 


Archimedes’  regular  snub  cube  (which  is  only  a  3-design)  by 
slightly  shrinking  each  square  face  and  expanding  each  trian¬ 
gular  face. 

One  of  our  constructions  gives  a  sequence  of  putative  spher¬ 
ical  t-designs  in  three  dimensions  with  N  =  12 m  points 
(m  >  2)  where  N  —  \t2{  1  +  o(  1))  as  t  — >  oo. 

III.  Spherical  ^-Designs  in  Four  Dimensions 

Analogous  results  have  been  obtained  in  four  dimensions  and 
will  be  described  if  time  permits. 
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Abstract  —  A  computational  algorithm  is  described 
for  the  numerical  evaluation  of  some  lattice  parame¬ 
ters  such  as  density,  thickness,  dimensionless  second 
moment  (or  quantizing  constant),  etc.  By  using  this 
algorithm,  previously  unknown  quantizing  constants 
of  some  interesting  lattices  can  be  obtained. 

I.  Introduction 

The  complete  geometric  structure  of  a  lattice  can  be  found 
from  the  description  of  its  Voronoi  cell.  The  knowledge  of  the 
Voronoi  cell  solves  at  once  the  problem  of  the  computation  of 
relevant  lattice  parameters  such  as  packing  radius,  covering 
radius,  kissing  number,  center  density,  thickness,  normalized 
second  moment  (or  quantizing  constant). 

The  Voronoi  cell  of  certain  highly  symmetric  lattices  can 
be  determined  analytically  but  no  such  result  is  available  for 
an  arbitrary  lattice.  In  this  paper  we  propose  an  algorithm 
which  exactly  computes  the  Voronoi  cell  of  a  full-rank  arbi¬ 
trary  lattice.  The  exact  knowledge  of  the  Voronoi  cell  (i.e., 
knowledge  of  the  coordinates  of  its  vertices,  edges,  etc.)  en¬ 
ables  one  to  compute  all  the  lattice  parameters  within  any 
degree  of  accuracy. 

The  Voronoi  cell  of  lattice  is  an  O-symmetric  convex  poly¬ 
tope,  i.e.,  a  bounded  region  delimited  by  a  finite  number  of 
hyperplanes  symmetric  about  the  origin.  The  basic  elements 
of  a  poly  tope  V  are  its  k-faces ,  where  k  is  the  dimension.  The 
0-faces  are  called  vertices  of  V ,  the  1-faces,  edges  of  V  and 
the  ( d  —  l)-faces,  facets  of  V.  For  convenience  we  identify  V 
with  the  d-face  and  the  empty  set  with  the  (  — l)-face.  To  give 
a  complete  description  of  a  polytope  we  must  know  all  the 
relations  among  its  faces.  For  -l<fc<cf-l  a  fc-face  /  and 
a  (k  +  l)-face  g  are  incident  upon  each  other  if  /  belongs  to 
the  boundary  of  g ;  in  this  case,  /  is  called  a  subface  of  g  and 
g  a  superface  of  f.  The  d-face  represents  the  whole  polytope 
and  is  the  only  superface  of  all  the  facets.  The  (-l)-face  has 
no  subfaces  and  is  the  only  subface  of  all  the  vertices.  The 
incidence  graph  I(V)  of  V  is  an  undirected  graph  defined  as 
follows:  for  each  fc-face  (k  =  -1,  0, 1, ...  d)  of  P,  I(V)  has  a 
node  v(f);  if  /  and  g  are  incident  upon  each  other  then  v(f) 
and  i/(g)  are  connected  by  an  arc. 

II.  The  diamond-cutting  algorithm 

This  algorithm  computes  the  incidence  graph  of  the  Voronoi 
region  V  of  a  lattice.  Its  name  was  chosen  due  to  its  resem¬ 
blance  to  the  procedure  for  cutting  a  raw  diamond  into  a  bril¬ 
liant.  Let  us  consider  a  lattice  A  defined  by  an  arbitrary  basis 
{vi,...,vrf}.  Given  a  point  p  we  will  denote  with  h( p)  the 
hyperplane  passing  through  the  point  p  and  normal  to  the 
vector  p.  The  distance  of  h( p)  from  the  origin  is  equal  to 
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Preparation  Given  the  lattice  basis  {vi,...,v^}  construct 
the  parallelotope  Q  defined  by  the  hyperplanes  /*.(±~Vi) 
for  i  =  1, . . . ,  d.  Q  contains  the  Voronoi  cell.  The  cor¬ 
responding  incidence  graph  I(Q)  has  3d  nodes.  Finally, 
set  V  :=  Q. 

Cutting  Consider  all  hyperplanes  A(^Lvi  +^-v2 H - 4fvd), 

with  At  integers,  which  cut  V  and  update  /(V),  by  in¬ 
troducing  the  nodes  corresponding  to  the  new  faces  and 
erasing  those  corresponding  to  the  faces  which  are  left 
out.  For  this  operation  we  have  adapted  Edelsbrunner’s 
algorithm  for  the  incrementation  of  arrangements  [2]. 

Finish  Compute  vol(V),  the  volume  of  V.  If  vol(V)  > 
det(A)1/2  keep  on  cutting,  else  end  the  algorithm  and 
output  the  incidence  graph  J(V). 

III.  Results 

By  introducing  some  additional  information  into  the  nodes  of 
the  incidence  graph,  it  is  possible  to  compute  all  the  lattice 
parameters  once  the  Voronoi  cell  is  found.  In  particular  we 
easily  find  the  packing  radius ,  the  kissing  number ,  the  covering 
radius  and  the  related  parameters.  Finding  the  quantization 
constant  requires  a  slightly  more  complex  procedure  which 
recursively  computes  the  volume  and  second  order  moment 
of  V  about  0  in  terms  of  the  volume  and  of  the  second-order 
moment  of  the  subfaces. 

Using  the  diamond-cutting  algorithm  we  have  computed 
some  previously  unknown  values  of  the  quantizing  constants 
for  some  particularly  interesting  lattices.  Of  special  inter¬ 
est  are  the  previously  unknown  quantizing  constants  for  the 
two  locally  optimal  lattice  coverings  in  R4  found  by  Dickson 
(Di^a  '  0.076993;  Di^b  :  0.077465)  and  for  a  5-dimensional  ex¬ 
treme  lattice  covering,  which  belongs  to  the  class  introduced 
by  Barnes  and  Trenerry  (0.076278).  As  these  lattices  do 
not  improve  upon  the  best  known  lattice  quantizers,  the  con¬ 
jecture  about  the  optimal  lattice  quantizers  being  the  duals  of 
the  densest  lattices  still  holds. 

Most  of  the  computational  problems  related  to  lattices  are 
either  known  or  conjectured  to  be  AP-hard  [1,  p.  40].  The 
principal  limitation  in  the  application  of  the  diamond-cutting 
algorithm  is  the  exponentially  increasing  memory  require¬ 
ment.  The  possibility  of  reducing  the  memory  requirements 
appears  remote  especially  if  we  want  to  preserve  the  generality 
of  the  algorithm. 
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Abstract  —  We  describe  a  new  nonlattice  sphere 
packing  J20  C  R20  which  is  denser  than  any  previously 
known  sphere  packing  in  R20.  Properties  of  J20  are 
investigated,  ans  several  alternative  representations 
of  the  new  packing  are  presented.  One  of  these  was 
recently  recognized  by  Conway  and  Sloane  as  the  first 
example  of  the  so-called  antipode  packings,  leading 
them  to  the  discovery  of  new  densest-known  sphere 
packings  also  in  dimensions  22  and  44—47. 

I.  Introduction 

It  is  well-known  since  the  celebrated  work  of  Shannon  [4] 
that  the  design  of  efficient  transmission  codes  for  band- limited 
channels  with  additive  white  Gaussian  noise  is  equivalent  to 
the  problem  of  constructing  dense  arrangements  of  nonover¬ 
lapping  spheres  in  Rn.  In  the  study  of  dense  sphere  pack¬ 
ings  in  Rn,  a  particular  effort  has  been  devoted  to  dimen¬ 
sions  n  <  24.  For  n  <  24  major  progress  was  achieved  by 
John  Leech  with  the  construction  of  his  famous  Leech  lat¬ 
tice  A24 ,  and  the  sequence  of  laminated  lattices  Ao,  Ai , . . .  A24 , 
which  may  be  obtained  as  cross-sections  of  A24  •  Presently,  the 
laminated  lattices  are  the  densest  packings  known  in  dimen¬ 
sions  n  <  29,  except  for  n  =  10,  11, 12,  13.  For  n  =  12  the 
Coxeter-Todd  lattice  K 12  is  the  densest  known  packing.  For 
n  —  10, 11,  13  nonlattice  packings  denser  than  the  laminated 
sequence  were  found  by  Leech  and  Sloane  [3]  in  1970.  Notwith¬ 
standing  the  vast  body  of  research  devoted  to  constructions  of 
dense  sphere  packings  in  recent  years  —  see  [1]  and  references 
therein  —  no  further  progress  for  n  <  24  has  been  reported 
in  the  intervening  two  and  a  half  decades. 

II.  The  Construction 

Given  a  sequence  of  binary  codes  Co,  C\, . . . ,  Cm,  consider  a 
packing  A  consisting  of  fill  the  points  with  the  follow¬ 

ing  property:  the  2*-s  row  in  the  coordinate  array  of  x  is 
a  codeword  of  Cx  for  i  =  0,1,...,  m.  We  use  the  notation 
A  =  Co  +  2 Ci  +  •••  +  2mCm  +  2m+1Zn,  to  describe  such 
a  packing.  Now,  let  C  and  C*  be  two  orthogonal  (n,  Mi,di), 
respectively  (n,M2,d2),  binary  codes  with  di,d2  >  n/4-j-2. 
We  shall  use  0, 1  to  denote  (codes  consisting  of)  the  all-zero 
and  the  all-one  n-tuples,  respectively.  The  (n,  2”*"1, 2)  binary 
code  consisting  of  all  the  vectors  in  IF"  of  even  weight,  respec¬ 
tively  odd  weight,  is  denoted  £n ,  respectively  On.  Consider 
two  sphere  packings  Je,  Jo  C  Rn,  defined  as  follows: 

Je  =  0  +  2  C'  +  4£n  +  8Zn 

Jo  =  1  +  2  C*  +  4  On  +  8Zn 

where  C'  =  1  +  C.  Let  J  =  JeVjo-  We  show  that  for  n  <  24, 
the  center  density  of  J  is  given  by 

j(j)  _  <-+»)%«■+»)  m 

Although  (2)  holds  for  all  n  <  24,  it  is  clear  from  the  condition 
d\ ,  d2  >  n/ 4+2  that  the  construction  of  (1)  would  be  most  suc¬ 
cessful  for  n  divisible  by  4.  For  n  =  8,  24,  we  take  the  (8, 24 , 4) 
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Hamming  code  and  the  (24,  212, 8)  Golay  code  and,  by  virtue 
of  the  fact  that  these  codes  are  self-orthogonal,  reproduce 
the  lattice  packings  E&  and  A24,  respectively.  For  n  =  20 
our  construction  calls  for  two  orthogonal  (20,512,7)  codes. 
A  (20,  29,  7)  linear  code  C  is  known  (cf.  [1,  p.248]),  and  the 
question  is  whether  its  dual  contains  another  (20,  512,  7)  sub¬ 
code.  This  question  is  settled  in  the  affirmative  using  a  qua¬ 
ternary  representation  for  C  and  C,_L,  similar  to  the  construc¬ 
tions  of  the  Golay  code  from  the  (6,  43 , 4)  hexacode,  and  of 
the  Nordstrom- Robinson  code  from  the  (4,42,3)  quadracode 
over  IF4.  Thus,  the  codes  C  and  C*  may  be  identified  with 
certain  binary  images  of  two  different  (5,  42, 4)  subcodes  of  the 
(5,43,3)  perfect  Hamming  code  over  IE*.  The  resulting  non¬ 
lattice  packing  J20  has  center  density  710  •  2  31  =0.1315... 
This  is  denser  than  the  best  previously  known  packing  A2o 
whose  center  density  is  1/8. 

III.  Properties  of  J20 

We  provide  severed  alternative  representations  of  J20  and  in¬ 
vestigate  some  of  its  properties.  In  particular,  we  show  that 
J20  may  be  constructed  as  an  ?{-packing,  where  7i  is  the  ring 
of  Hurwitz  quaternions.  Furthermore,  we  prove  that  although 
J20  is  not  a  lattice  it  is  distance  invariant.  This  allows  us  to 
express  the  theta  series  of  J20  in  terms  of  the  theta  functions 
02(z)y03(z)104(z)  and  the  weight  distribution  of  the  (20,  29, 7) 
binary  code  C.  Precise  enumeration  of  the  first  six  shells  of 
the  new  packing  is  presented.  In  particular,  the  kissing  num¬ 
ber  of  J20  is  shown  to  be  15360,  which  is  slightly  less  then 
the  kissing  number  of  A2o  given  by  17400.  This  demonstrates 
once  again  that  in  general  the  answers  to  the  packing  problem 
and  the  kissing  number  problem  may  differ  (cf.  [1,  p.23]). 

Although  we  establish  the  distance  invariance  of  J2 o,  we 
were  unable  to  determine  whether  J20  has  the  stronger  prop¬ 
erty  of  geometrical  uniformity.  This  was  recently  settled  by 
Conway  and  Sloane  [2],  who  showed  that  the  affine  automor¬ 
phism  group  of  J20  is  not  only  transitive  on  the  spheres,  but 
also  doubly- transitive  on  adjacent  spheres.  In  fact,  Conway 
and  Sloane  [2]  provide  a  complete  characterization  of  Aut(j2 0) 
in  terms  of  the  automorphism  group  of  the  Leech  lattice. 

Finally,  we  provide  yet  another  representation  of  J20  as 
the  union  of  four  cosets  of  2A2o-  Conway  and  Sloane  [2]  show 
that  this  representation  of  J20  is  a  special  case  of  their  new 
antipode  construction  of  sphere  packings.  The  antipode  con¬ 
struction  of  [2]  is  remarkable  in  that  it  readily  establishes  the 
existence  of  sphere  packings  in  dimensions  22,44,45,46,47  that 
are  denser  than  previously  known.  All  these  packings  were 
discovered  in  [2].  We  note  here  that  in  most  cases  (including 
Jie  and  J20 ),  the  antipode  set  is  a  simplex. 
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Abstract  —  A  new  class  of  spherical  codes  is  pre¬ 
sented  which  are  designed  analogously  to  laminated 
lattice  construction.  For  many  minimum  angular  sep¬ 
arations,  these  “laminated  spherical  codes”  outper¬ 
form  the  best  known  spherical  codes.  In  fact,  for 
fixed  dimension  k  <  49,  the  density  of  the  laminated 
spherical  code  approaches  the  density  of  the  (k  —  1)- 
dimensional  laminated  lattice  A *_i,  as  the  minimum 
angular  separation  6  0.  In  particular,  the  three- 

dimensional  laminated  spherical  code  is  asymptoti¬ 
cally  optimal,  in  the  sense  that  its  density  approaches 
the  Fejes  Toth  upper  bound  as  0  — ►  0.  The  laminated 
spherical  codes  are  also  structured,  which  simplifies 
decoding. 

A  spherical  code  C(k,  0)  is  a  set  of  points  on  the  surface  of 
a  A>  dimensional  unit  radius  sphere  Sk  having  minimum  angu¬ 
lar  separation  0 .  The  density  of  C(k,9),  denoted  A C(k,e),  is 
the  ratio  of  the  surface  area  of  |C(fc,0)|  disjoint  spherical  caps 
centered  at  the  codepoints  and  with  angular  radius  0/ 2,  to 
the  surface  area  of  S* .  Let  A(fc,0)  =  ma xc(k,$)  &c(k,e)-  Note 
that  the  maximum  number  of  codepoints  in  any  fc-dimensional 
spherical  code  with  minimum  angular  separation  0  can  be  de¬ 
termined  directly  from  A (&,  0).  We  refer  to  a  family  of  codes 
C(k,0)  as  asymptotically  optimal  if  A C(k,e)/ A(k,  0)  1  as 

0^0. 

For  fixed  dimension  k  and  small  minimum  angular  separa¬ 
tion  0,  [Fej59]  (k  =  3)  and  [Cox68]  ( k  >  4)  provide  the  tight¬ 
est  upper  bound  and  [GHSW87]  provides  the  tightest  known 
lower  bound  on  A (k,  0).  However,  there  is  a  gap  between  these 
bounds  as  0  — ►  0.  In  this  paper  we  introduce  a  new  spheri¬ 
cal  code  construction  analogous  to  laminated  lattice  construc¬ 
tion.  We  call  these  codes  laminated  spherical  codes.  These  new 
codes  have  larger  asymptotic  (for  small  0)  densities  than  any 
previously  known  spherical  codes. 

The  laminated  spherical  codes  are  obtained  by  placing 
codepoints  on  concentric  (k  —  l)-dimensional  spheres  and  pro¬ 
jecting  each  codepoint  onto  Sk  by  adding  a  fcth  coordinate 
to  form  a  vector  of  unit  norm.  The  set  of  points  on  each 
(k  —  l)-dimensional  sphere  is  either  a  (k  —  l)-dimensional  lam¬ 
inated  spherical  code,  or  another  code  formed  from  its  deep 
holes.  By  nesting  the  concentric  spheres  closely,  and  placing 
codepoints  of  one  sphere  at  the  radial  extension  of  the  deep 
holes  of  codepoints  of  the  previous  sphere,  a  method  similar  to 
constructing  laminated  lattices  (e.g.,  [CS93])  is  used  to  con¬ 
struct  our  spherical  codes,  which  we  denote  by  CA.  As  more 
of  these  concentric  spheres  are  stacked  up,  codepoints  start 
spreading  out,  and  the  density  lessens.  To  counteract  this,  a 
buffer  zone  is  placed  between  concentric  spheres,  and  a  new, 
tighter  packed  (A;  —  l)-dimensional  spherical  code  is  placed  in 
the  next  sphere.  A  recursion  describes  the  sequence  of  radii 
necessary  to  insure  that  both  the  desired  minimum  angular 

xThe  research  was  supported  in  part  by  the  National  Science 
Foundation,  the  Joint  Services  Electronics  Program,  and  Engineer¬ 
ing  Research  Associates  Co. 


separation  is  maintained  and  the  desired  density  is  obtained. 

Our  construction  has  similarities  to  those  of  [Yag58]  and 
[GHSW87]  in  that  a  projection  from  k  —  1  dimensions  to  k  di¬ 
mensions  is  used;  the  difference  lies  in  the  placement  of  points 
prior  to  the  projection.  Our  technique  is  practical  for  cre¬ 
ating  codes  of  any  size  and  thus  provides  a  lower  bound  on 
achievable  minimum  distance  as  a  function  of  code  size. 

Let  Aca (k)  =  limsupe_0  Aca^  0y  and  let  AAfc  be  the 
density  of  the  sphere  packing  constructed  from  the  laminated 
lattice  Ah.  In  the  laminated  spherical  code,  layers  ((k  —  1)- 
dimensional  spheres)  are  stacked  similarly  to  layers  of  lattices 
in  a  laminated  lattice,  and  as  a  result,  A^a  (k)  is  equal  to  the 
density  of  the  sphere  packing  generated  by  Ak-i. 

Theorem  1  A cA(k,d)  =  AAfc_l  -  0(d1/k). 

Corollary  1  CA(S,d)  is  asymptotically  optimal  and  the  Fejes 
Toth  upper  bound  is  asymptotically  tight. 

Corollary  2  If  there  exists  a  family  of  spherical  codes  C(k ,  d ) 
whose  asymptotic  density  is  higher  than  A^a  ( k ,  d),  then  there 
exists  a  (k  —  1)- dimensional  sphere  packing  denser  than  that 
generated  by  Ak-i . 

Theorem  2  There  is  an  optimal  decoder  for  CA(k,0)  using 
0{y/\CA(k ,  0)|)  space  and  0(log  \CA(k,  0)|)  time,  or  an  opti¬ 
mal  decoder  using  0(1)  space  and  0(y/\CA  (k,  0)|)  time. 
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Abstract  —  a- trees  are  a  class  of  geometric  struc¬ 
tures  that  include  lattices  as  a  constrained  special 
case.  These  structures  allow  for  signal  sets  that  are 
more  spherical  in  shape  than  lattices  in  spaces  of  arbi¬ 
trary  dimensionality.  In  this  paper,  the  use  of  (j-trees 
in  the  construction  of  non-lattice  TCM  codes  is  estab¬ 
lished  and  investigated. 

I.  Introduction 

A  P-level  cr-tree  signal  constellation  T(i-.p)  consists  of  a  set 
of  2P  points  in  A-space  that  is  formed  from  the  direct  sum 
of  an  ordered  P-member  collection  binary  constituent  sets 
(Gi ,G2|...,Gp).  One  vector  (called  a  generator),  gpj  G 
Gp,  j  e  {0,1},  is  selected  from  each  constituent  set  and  all 
selected  vectors  are  summed  to  form  a  point,  t,  in  the  signal 
constellation.  A  (P  -  Q  +  l)-level  subtree  T^q-.p)  of  T(i:p)  is 
a  cr-tree  that  is  formed  from  the  last  (P  —  Q  +  1)  constituent 
sets;  i.e.  T^q-.p)  =  Gq  +  Gq+i  +  •  •  •  +  Gp.  The  design  of 
these  constellations  is  based  on  an  iterative  algorithm  that 
uses  training  data  drawn  from  multidimensional  probability 
distribution  functions  [1,2].  More  interestingly,  a  cr-tree  sig¬ 
nal  constellation  T  has  a  sequence  of  subtrees  T  that  induces 
a  partition  of  T  into  partition  chains  with  expanded  intra¬ 
subtree  minimum  distances. 

II.  <j- Tree  TCM  Codes 

A  cr-tree  coset  code  C (T/T';C)  is  based  on  a  cr-tree  T,  a  <t- 
subtree  T',  and  a  binary  encoder  C.  Figure  (la)  illustrates 
the  general  encoder  structure.  The  order  of  the  constituent 
sets  G\ ,  G2, .  •  •  Gm+r  plays  an  essential  role  in  determining  a 
useful  partition  of  T.  For  the  one-dimensional  a- tree  T,  the 
constituent  set  with  the  lowest  energy  has  to  be  in  the  first 
level  of  the  tree,  then  continuing  in  ascending  order  until  we 
have  the  constituent  set  with  the  largest  energy  in  the  last 
level  [3].  To  transmit  m  bits  per  A  dimensions,  the  signal 
constellation  must  be  based  on  an  (m  +  r)-level  cr-tree  T,  par¬ 
titioned  into  2k+r  subsets,  each  consisting  of  2m  k  points  from 
a  different  coset  of  the  (m  —  fc)-level  cr-subtree  T( .  Constituent 
sets  are  divided  into  coded  and  un coded  constituent  sets,  based 
on  the  data  bits  to  address  them.  The  direct  sum  of  the  un¬ 
coded  constituent  sets  form  the  cr-subtree  T* ,  while  the  direct 
sum  of  the  coded  constituent  sets  form  a  system  of  cosets. 
Of  the  incoming  m  data  bits,  k  bits  are  applied  to  a  binary 
encoder  to  get  a  (k+r)-coded  bits  with  which  to  select  a  sub¬ 
set  of  the  cr-tree  TCM  code.  This  is  performed  with  each  of 
the  (k-fr)-coded  bits  selecting  one  generator  from  the  binary 
coded  constituent  sets.  The  direct  sum  of  the  generators  form 
a  coset  representative,  c.  The  remaining  (m-k)  uncoded  bits 
selects  a  point  t '  from  the  cr-subtree  T(fc+r+i;m+r),  added  to 
the  coset  representative  to  form  the  transmitted  signal  point 


uncoded  constituent  sets  form  a  subtree 


(a)  Encoder. 


t  =  c+t';  i.e.  t(j)  =  Y^~i  SpOP)  wiiere  iP  is  the  Pth  element 
of  the  (m+r)-tuples  binary  label  j. 

An  optimum  subset  decoder  shown  in  Fig.  (lb),  devel¬ 
oped  for  the  one-dimensional  case,  works  as  follows.  For  a 
(P  —  Q  +  l)-level  subtree  Tq  of  a  P-level  binary  cr-tree  Ti, 
there  are  2F~Q+l  parallel  transitions  between  each  pair  of 
states.  To  choose  one  of  these  parallel  transitions,  (P  -  Q  +  1) 
decisions  are  needed.  First,  the  received  channel  output  is 
translated  by  a  coset  representative  c(jQ~1jQ~ 2  •  -  •  j1)  of  the 
signal  subset  S( j)  assigned  to  the  parallel  transitions.  Then 
the  translated  channel  output  is  applied  sequentially  to  the 
(P-Q  +  l)  constituent  sets  of  Tq  starting  with  Gp,  the  con¬ 
stituent  set  of  largest  energy.  At  each  stage,  a  generator  is 
determined,  then  is  subtracted  from  the  current  stage's  input 
and  the  result  is  passed  to  the  next  stage  (constituent  set)  till 
we  end  with  the  constituent  set  Gq.  The  direct  sum  of  the 
decoded  generators  gP  jp  ,  gP_ltjP-i ,  *  *  ■  9qjQ  an(l  fhe  coset 
c(j)  forms  the  decoded  signal  t. 
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Abstract  —  Low  complexity  soft  decision  decoding 
algorithms  for  Reed— Muller  codes  and  Barnes— Wall 
lattices  are  presented.  These  algorithms  are  con¬ 
structed  based  on  the  usage  of  generalized  minimum 
distance  (GMD)  decoding  recursively.  Evaluation  of 
the  algorithms  on  AWGN  channel  through  computer 
simulation  indicates  a  slight  degradation  in  perfor¬ 
mance,  compared  to  maximum  likelihood  decoding, 
but  with  considerable  reduction  in  complexity. 

I.  Introduction 

Decoding  Reed-Muller  codes  and  Barnes-Wall  lattices  be¬ 
come  very  important  because  of  extensive  studies  of  various 
codes  and  lattices  in  coded  modulation  application  in  recent 
years.  Most  previous  decoding  algorithms  relied  on  trellis 
decoding.  However,  trellis  can  become  very  complicated  for 
codes  of  longer  length  or  lattices  of  higher  dimension.  There¬ 
fore,  following  the  approach  suggested  by  Forney  [3],  we  apply 
hard  decision  decoding  algorithm  via  GMD  decoding  [1]  to 
realise  low  complexity  soft  decision  decoding  of  Reed-Muller 
codes  and  Barnes- Wall  lattices.  In  [4]  Taipale  and  Pursley 
proposed  an  improvement  to  Forney’s  GMD  decoding  algo¬ 
rithm.  However,  it  still  may  fail  to  find  an  acceptable  code¬ 
word.  In  this  paper,  we  provide  a  measure  of  compensation 
and  present  low  complexity  soft  decision  decoding  algorithms 
for  Reed-Muller  codes  and  Barnes- Wall  lattices  by  recursively 
using  GMD  decoding.  Evaluation  of  the  algorithms  on  AWGN 
channel  through  computer  simulation  indicates  a  slight  degra¬ 
dation  in  performance,  compared  to  maximum  likelihood  de¬ 
coding,  but  with  considerable  reduction  in  complexity. 

II.  GMD  Decoding  of  Reed-Muller  Codes  and 
Barnes-Wall  Lattices 

We  first  show  that  the  original  majority  logic  decoding  al¬ 
gorithm  for  Reed-Muller  codes  [2]  can  be  easily  extended  to  an 
error-and-erasure  decoding  procedure.  Then  we  can  incorpo¬ 
rate  the  criterion  in  [4]  to  derive  an  improved  GMD  decoding 
procedure.  Our  soft  decision  decoding  algorithm  is  then  re¬ 
alized,  based  upon  the  (tt|u  +  v)  construction  of  Reed-Muller 
codes,  by  recursively  applying  the  improved  GMD  decoding 
procedure.  Namely,  if  a  received  vector  can  not  be  decoded 
to  a  codeword  in  RM(rt  m),  then  decode  it  to  codewords  in 
RM(r  —  1,  m  —  1)  and  RM(r,  m  —  1)  respectively.  Finally, 
an  acceptable  codeword  in  RAf(r,  m)  can  be  obtained.  The 
complexity  of  this  algorithm  for  decoding  Reed-Muller  codes 
in  the  worst  case  is  n  =  2m  or  22(m“r)  <  »2  ,  while  in  the 
average  case,  it  will  be  much  lower. 

It  is  known  that  the  connection  between  Barnes— Wall  lat¬ 
tices  and  Reed-Muller  codes  can  be  described  by  various  code 
formulas.  Therefore  it  is  obvious  that  the  decoding  of  Barnes- 
Wall  lattices  can  be  directly  derived  from  the  decoding  of 
Reed-Muller  codes. 

1This  work  was  supported  by  the  NSF  Grant  NCR-9406043. 


III.  Simulation  Results 
The  error  performance  of  the  proposed  algorithm  for  de¬ 
coding  RM{  1,  3)  and  RM( 2,4)  in  AWGN  channel  is  shown  in 
Fig.  1,  and  further,  performance  for  decoding  BWb(Eb)  and 
-BWie(Aifl)  assuming  16QAM  signaling  in  AWGN  channel  is 
shown  in  Fig.  2. 


Figure  1:  Proposed  algorithm  vs  MLD  algorithm  for  de¬ 
coding  RM(1,3)  and  RM(2,4) 


Figure  2:  Bit  error  rate  of  coded  16QAM  using  BWS  and 
BWie  lattices 
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Abstract  —  In  this  paper  we  present  a  new  view 
to  the  problem  of  constellation  shaping.  Both  a  new 
procedure  and  information  theoretic  analysis  are  dis¬ 
cussed.  The  talk  presents  an  approach  to  understand¬ 
ing  constellation  shaping  that  avoids  the  “continuous 
approximation”  analysis  of  performance.  A  unique 
“type-mapping”  approach  to  shaping  is  derived  and 
related  to  monomial  orderings  on  a  ring  of  polynomi¬ 
als. 

I.  Introduction 

Constellation  shaping  is  method  of  improving  the  effective¬ 
ness  of  digital  communications  over  noisy,  bandlimited  chan¬ 
nels.  This  topic  has  drawn  considerable  interest  in  recent  days 
[1,  2,  3].  The  roots  of  the  topic  go  back  to  the  paper  on  Lat¬ 
tice  Codes  and  Cosets  by  Conway  and  Sloane[4],  while  the 
current  framework  for  discussing  the  topic  was  outlined  by 
Forney  and  Wei  [5].  Three  basic  approaches  to  the  problem 
were  given  by  Lang  and  Longstaff  [6],  based  on  “shell  map¬ 
ping”,  Calderbank  and  Ozarow  [7],  based  on  “nonequiproba- 
ble  signaling”,  and  Forney [8]  based  on  “trellis  shaping”.  The 
recent  high-speed  telephone  modem  standard,  v.34  (“v.fast”) 
incorporates  a  version  of  shell  mapping  in  the  standard. 

II.  Discrete  Analysis 

The  paper  presents  an  approach  to  constellation  shaping 
that  avoids  the  “continuous  approximation”  (CA)  analysis 
of  performance.  The  crux  of  the  CA  method  is  related  to 
the  asymptotic  shaping  gain  that  can  be  derived  from  the 
entropy  power  of  the  uniform  distribution.  If  A  is  a  uni¬ 
form  random  variable  on  the  interval  [—A,  A],  then  it  has  a 
“power”  P(X)  —  E(X)3  =  A2/ 3  and  a  differential  entropy 
h(X)  =  |log(4A2);  a  Gaussian  random  variable  Y,  with  zero 
mean  and  variance  cr2,  has  “power”  P(Y)  =  a2  and  differ¬ 
ential  entropy  h(Y)  =  -  log(27recr2).  For  equivalent  entropy, 
the  Gaussian  power  P(Y)  =  ^P(X)  which  is  -1.53  dB  less 
then  the  Uniform  power.  This  means,  for  transmission  over 
power  constrained  channels,  Gaussian  distributed  signaling 
has  a  1.53  dB  advantage  over  uniformly  distributed  signals. 

In  practice,  however,  discrete  signal  sets  are  always  used. 
For  example,  consider  how  2  bits  might  be  transmitted  over 
a  7 Z  valued  channel.  The  basic  approach  is  to  use  the  4- 
PAM  signal  set  {—3,  —  1,  +1,  +3}  with  a  uniform  distribu¬ 
tion  Then  the  entropy  (rate)  is  2  bits  and 

the  average  power  |l  +  |9  =  5.  To  achieve  a  shaping 
gain,  the  signal  set  is  increased  and  a  code  is  used  to  in¬ 
duce  a  non-uniform  distribution.  For  example,  if  the  6- 
PAM  signal  set,  {— 5,  —3,  — 1,  +1,  +3,  +5}  is  used  a  gain 
can  be  achieved  by  selecting  a  blocklength  n  =  4  and 
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signaling  with  the  2nR  =  24'2  =  256  least  power  sig¬ 
nals.  These  signals  induce  a  non-uniform  marginal  dis¬ 
tribution  of  (1,  28,  35,  35, 28, 1)/128  resulting  in  a  power  of 
4.875,  a  .11  dB  improvement  over  4-PAM  .  The  optimum 
distribution  for  rate  2,  6-PAM  is  iid  with  marginal  distri¬ 
bution  (0.0155,  0.1258,  0.3587,  0.3587, 0.1258,  0.0155);  this  has 
entropy  of  2  bits  and  power  3.7569,  a  1.2414  dB  improvement! 
By  going  to  8-PAM  ,  a  1.2525  dB  is  feasible  with  the  maxi¬ 
mum  gain  tops  out  at  1.2526  dB.  Thus  for  a  rate  of  2  bits, 
the  1.53  dB  gain  is  never  obtainable  (i.e.,  for  the  1.53  dB  gain 
both  n  and  R  must  grow  to  infinity). 

The  basic  methods  of  studying  tradeoffs  are  developed  and 
explicit  formulas  are  derived.  The  relationship  to  the  capacity 
of  the  additive  white  Gaussian  channel  are  discuss  where  it  is 
shown  that  shaping  techniques  bridge  the  “uniform  distribu¬ 
tion”  gap. 

III.  Type-Mapping 

The  basic  methods  of  constellation  shaping  can  be  roughly 
characterized  as  forms  of  coset  coding  (i.e.,  codes  for  which 
messages  are  associated  with  cosets  of  a  subgroup  such  as  lin¬ 
ear  subspace)  and  enumerative  coding  (i.e.,  codes  for  which 
messages  are  enumerations  of  vectors).  A  unique  “type¬ 
mapping”  approach,  an  enumerative  technique,  is  derived  and 
related  to  monomial  orderings  on  a  ring  of  polynomials.  It  is 
shown  how  this  approach  provides  a  rate  flexible  and  optimal 
tradeoff  between  peak  and  average  power 
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Abstract  —  There  are  just  three  regular  poly  topes 
in  Euclidean  (n  >  4)-space.  Their  dimensions  are  de¬ 
termined,  including  the  distance  from  the  centroid  to 
the  periphery  in  a  random  direction — that  of  a  white- 
Gaussian-noise  vector.  As  n  oo,  this  distance  be¬ 
comes  very  predictable.  It  differs  from  the  distance 
near  which  almost  all  of  the  volume  and  surface  of  the 
polytope  lie. 

There  are  infinitely  many  regular  polygons,  and  there  are  5 
or  9  regular  3- dimensional  solids.  In  Euclidean  4- space  there 
are  6  or  16  regular  polytopes.  The  second  number — 9  or  16 — 
includes  the  possibility  that  faces  will  intersect  one  another 
internally  and  may  also  intersect  themselves,  as  in  the  case 
of  the  regul  r  pentagram  (five-pointed  star).  But  in  spaces 
of  n  >  5  dimensions  there  are  only  3  regular  poly  topes:  the 
hypercube,  which,  for  n  —  2,  3,  and  4,  is  a  square,  cube, 
and  tesseract,  respectively;  the  cross  polytope,  which  is  the 
dual  of  the  hypercube  and,  for  n  =  2,  3,  and  4,  is  a  square, 
octahedron,  and  16-hedroid;  and  the  simplex,  which  is  self¬ 
dual  and,  for  n  =  2,  3,  and  4,  is  a  triangle,  tetrahedron,  and 
pentahedroid. 

The  discrete  set  of  different  signals  that  might  be  transmit¬ 
ted  during  any  signaling  interval  may  be  represented  by  a  set 
of  points  in  such  a  space.  To  each  of  these  signal  points  belongs 
a  Voronoi-polytope  decision  region.  White  Gaussian  noise  in 
the  transmission  channel  will  add  random  contributions  to 
the  coordinates  of  the  transmitted-signal  point,  moving  it  a 
somewhat  random  distance  in  a  uniformly  distributed  random 
direction  and  causing  a  reception  error  if  it  moves  the  signal 
point  outside  its  Voronoi  poly  tope.  It  is  therefore  of  interest 
to  understand  how  far  such  a  polytope  extends  in  a  random 
direction  and  to  compare  that  distance  with  the  rms  distance 
to  a  random  point  distributed  uniformly  over  the  volume  of 


the  polytope.  Such  questions  are  most  easily  answered  for  the 
simplest  polytopes,  i.e.,  the  regular  polytopes,  and  some  of 
the  phenomena  exhibited  by  the  regular  polytopes  will  also 
occur  in  the  others. 

In  each  case  we  suppose  that  the  regular  poly  topes  have 
edges  of  unit  length.  Table  I  lists  the  height  Hn ,  the  distance 
Ink  from  the  center  to  the  k- dimensional  faces,  the  volune  Vn, 
the  radius  In  =  In,n- 1  of  the  inscribed  sphere,  the  length  Ln 
of  a  ray  in  a  random  direction  from  the  center  to  the  periphery, 
the  radius  Sn  of  a  sphere  having  the  same  volume,  the  rms 
distance  pn  to  interior  points,  and  the  radius  Cn  =  Ino  of  the 
circumscribed  sphere  for  the  3  unit-edge  regular  polytopes. 

The  last  five  dimensions  appear  in  the  order  of  increasing 
size  when  n  >  15.  When  n  <  15,  p^ross  <  S£ross;  when  n  <  10, 
Pcnbe  <  S‘ube;  and  when  n  <  5,  ^mp  <  S*imp.  Moreover, 
pn  <  In  for  the  cross  polytope  if  n  <  4,  for  the  cube  if  tl  <  3, 
and  for  the  simplex  if  n  =  1.  For  n  1  the  radius  of  the 
sphere  having  the  same  area  as  any  of  the  three  polytopes  is 
asymptotically  equal  to  its  5n. 

Comparison  of  the  fourth  moment  of  the  distance  from  the 
center  of  each  regular  polytope  with  the  square  of  the  second 
moment,  p*  ,  shows  that,  for  n  >■  1,  nearly  all  of  the  volume 
lies  within  a  thin  spherical  shell  of  radius  p„.  The  fact  that 
Ln  <  pn  for  large  n  indicates  that  nearly  all  of  the  volume 
of  these  polytopes  lies  within  a  very  small  hypersolid  angle 
about  the  center  when  n  >  1.  Setting  pn  equal  to  Ink,  we  find 
the  largest  k  such  that  the  boundary  faces  of  dimensionality 
less  than  k  lie  wholly  outside  the  spherical  shell  containing 
nearly  all  of  the  volume  of  the  polytope,  viz.,  k  —  nj 2  for  the 
simplex,  k  =  2n/3  for  the  hypercube,  and  k  =  +  1  for  the 

cross  polytope.  Full  details  should  appear  next  year  in  the 
IEEE  Transactions  on  Information  Theory . 
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Forney  has  proposed  an  iterated  construction  called  the 
squaring  construction  for  simplified  derivation  and  represen¬ 
tation  of  the  Barnes-Wall  lattices.  He  used  as  a  starting  par¬ 
tition  chain  the  two-dimensional  infinite  two-way  partition  . . . 
Z2 /  Z2 /  7 ZZ2  /  2 Z2 j  27 ZZ2  j  ...  with  minimum  Euclidean 
distances  . . .  1/1/2/4/8/ . . .,  where  71  is  a  two-dimensional 
rotation  operator  .  We  apply  this  construction  to  the  one¬ 
dimensional  infinite  two-way  partition  . . .  Z / Z /2Z / 4Z /8Z /  . . . 
with  minimum  /i -distance  ...1/1/2/4/8  /  ...  which  has  clearly 
the  same  properties  as  the  previous  partition. The  resulting 
lattices  of  dimension  N  —  2n  for  the  l\ -distance  can  there¬ 
fore  be  regarded  as  the  duals  of  the  Barnes-Wall  lattices  of 
dimension  2 N  for  the  Euclidean  distance.  Since  the  2-depth 
of  each  of  these  lattices  is  equal  to  n  they  necessarily  contain 
the  2 nZN  lattice.  The  coset  representatives  of  these  lattices 
in  i/2nZN,  where  v  is  an  arbitrary  nonnegative  integer,  are 
good  codes  for  the  Lee  distance  since  they  outperform  the  ne- 
g acyclic  codes  in  low  dimensions.  Maximum  Likelihood  (ML) 
soft  detection  can  be  performed  easily  on  these  lattices  and 
codes  since  they  have  a  simple  trellis  structure.  Furthermore 
low  complexity  detection  algorithms  such  as  multistage  decod¬ 
ing  can  be  used  without  noticeable  performance  degradation. 
This  is  not  the  case  for  negacyclic  codes  where  only  algebraic 
hard  decoding  is  performed  easily  using  for  example  Euclid’s 
algorithm. 

The  explicit  expression  of  the  lattices  obtained  by  the 
Squaring  Construction  motivates  us  to  consider  a  more 
general  construction  based  on  multilevel  coding  first  pro¬ 
posed  by  Imai  and  Hirakawa.  We  consider  jointly  a  //-level 
code  C  —  [C0,  Ci, . . . ,  C^_i]  and  a  finite  partition  chain 
Z / qZ / q2 Z /  . . .  / Z ,  where  each  code  Ci  is  an  (N,  Ki,  dt) 
block  code  over  the  Galois  Field  GF(g  =  pm)  with  Hamming 
distance  di ,  and  m  is  an  arbitrary  nonnegative  integer.  An 
JV-dimensional  code  A  can  be  defined  as  the  set  of  integer 
Ar-tuples  A  that  are  congruent  to  +  •  •  •  +  cig  +  Co 

modulo  q^ ,  where  a  is  a  codeword  in  the  code  Ci,  i.e.,  the 
coefficients  of  g*  in  the  g-ary  representation  of  A  are  code¬ 
words  in  Ci ,  0  <  i  <  //  —  1.  The  resulting  code  A  is  generally 
nonlinear  and  a  necessary  condition  for  it  to  be  a  lattice  is 
that  the  component  codes  Ci  satisfy  the  condition  Ci  C  Ct+i. 
Also  for  a  good  design  of  A  the  Hamming  distances  of  the 
component  codes  CM_i , . . . ,  C\ ,  Co  should  be  chosen  in  the 
form  q,  . . . ,  g^-1 ,  gM.  For  the  lattices  obtained  by  the  Squar¬ 
ing  Construction  q  =  2  and  the  component  codes  are  Reed- 
Muller  codes  that  satisfy  the  two  previous  conditions  on  the 
component  codes  Ci. 

In  Section  I  we  apply,  as  we  have  mentioned  before,  the 
Squaring  Construction  to  the  one-dimensional  infinite  two- 
way  partition  . . .  Z / Z /2Z /AZ /%Z /  . . ..  We  generalize  the  no¬ 
tion  of  the  2-depth  of  a  binary  lattice  introduced  by  Forney 
to  the  case  of  nonbinary  lattices  and  nonlinear  codes  and  use 
this  notion  as  a  measure  of  the  implementation  complexity  of 
the  corresponding  lattice.  Furthermore,  we  derive  the  two- 
dimensional  density  of  each  lattice  obtained  by  this  construc¬ 


tion  and  determine  its  behavior  when  the  lattice  dimension 
goes  to  infinity.  We  give  also  an  explicit  expression  of  the 
asymptotic  value  of  this  density  as  a  function  of  the  lattice 
2-depth,  which  we  assume  fixed,  when  the  lattice  dimension 
goes  to  infinity. 

For  comparison  reasons,  we  present  in  Sections  II  and  III 
the  negacyclic  and  shortened  BCH  codes  designed  for  the  Lee- 
metric,  introduced  respectively  by  Berlekamp  and  Roth  and 
Siegel  .  We  apply  Construction  A  to  these  codes  and  derive 
dense  lattices  for  the  Ji -distance.  Moreover,  we  give  an  explicit 
expression  of  the  behavior  of  the  two-dimensional  density  of 
these  lattices  when  their  dimension  goes  to  infinity  and  show 
that  it  is  the  same  in  the  two  cases.  We  show  also  that  the 
expression  of  the  two-dimensional  asymptotic  density  for  fixed 
lattice  2-depth  is  identical  in  both  cases  to  that  found  for  the 
lattices  obtained  by  the  Squaring  Construction. 

Multilevel  coding  is  considered  in  Section  IV.  As  we  have 
said  before  this  construction  provides  a  class  of  lattices  and 
nonlinear  codes  which  includes  the  lattices  obtained  by  the 
Squaring  Construction.  We  show  that  when  considering 
the  one-dimensional  two-way  partition  Z  j2Z /22  Z  /  /2M-1zT 
and  using  binary  BCH  codes  as  component  codes  we  obtain  an 
approximate  lattice  density  which  is  one  quarter  that  obtained 
in  the  case  of  negacyclic  and  shortened  BCH  codes.  However, 
for  fixed  2-depth,  the  asymptotic  two-dimensional  density  is 
found  to  be  equal  to  that  obtained  for  lattices  based  on  nega¬ 
cyclic  and  shortened  BCH  codes. 

In  Section  V  we  consider  two  applications  of  Lee-metric 
codes  and  /i -distance  lattices.  The  first  one  is  concerned  with 
shift,  insertion  and  deletion  error  correction  in  peak-detection 
magnetic  recording  channels.  The  second  one  deals  with  error 
correction  when  transmitting  data  through  the  Rician  chan¬ 
nel.  We  have  considered  two  four-dimensional  constellations, 
with  (2  bit/s)/Hz  as  spectral  efficiency,  based  on  the  Schlfli 
lattice  7)4,  which  is  dense  for  the  Euclidean  distance,  and  the 
lattice  E\ ,  which  is  dense  for  the  /1 -distance  and  obtained  by 
the  Squaring  Construction.  The  simulation  results  show  that 
a  coding  gain  of  the  order  of  2  dB  can  be  achieved  by  the 
constellation  based  on  E\  over  that  based  on  D\  for  symbol 
error  rates  of  the  order  of  10-4  when  considering  a  Rician 
channel  with  specific  characteristics  to  be  detailed  later.  We 
show  also  that  even  if  the  lattice  Ag  obtained  by  the  Squaring 
Construction,  which  is  dense  for  the  /1 -distance,  can  achieve  a 
large  asymptotic  coding  gain  over  the  Gosset  lattice  Eg,  which 
is  dense  for  the  Euclidean  distance,  it  cannot  provide  positive 
coding  gains  for  moderate  signal-to-noise  ratios  because  its 
kissing  number  is  too  large  compared  to  that  of  Eg. 
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I.  Introduction 

The  redundancy  of  a  source  code  is  the  difference  between  its 
expected  performance  and  the  optimum  performance  theoret¬ 
ically  attainable(OPTA).  The  redundancy  problem  of  source 
coding  is  to  investigate  the  trade-off  between  the  minimum 
redundancy  over  a  class  of  codes  having  a  common  parame¬ 
ter  such  as  block  length  )  and  the  common  parameter.  The 
significance  of  the  redundancy  problem  is  obvious  when  one 
takes  into  account  the  following  facts:  first,  as  compared  with 
OPTA,  the  minimum  redundancy  gives  the  second-order  the¬ 
oretical  performance  and,  therefore,  is  one  of  the  basic  prob¬ 
lems  in  source  coding  theory;  second,  the  redundancy  prob¬ 
lem  provides  a  basis  for  comparison  of  different  source  cod¬ 
ing  algorithms;  and  finally,  the  minimum  redundancy  can  tell 
algorithm-designers  how  much  room  they  do  have  to  improve 
the  performances  of  their  algorithms. 

In  this  paper,  we  shall  assume  that  the  common  parameter 
associated  with  the  codes  considered  is  block  length.  We  shall 
refer  the  minimum  redundancy  over  the  class  of  all  codes  hav¬ 
ing  block  length  n  and  some  specified  type  as  the  nth-order 
redundancy.  (In  what  follows,  different  names  will  be  given  for 
different  types  of  codes.)  In  lossless  source  coding,  the  OPTA 
is  the  Shannon  entropy  and  there  exists  extensive  literature 
studying  the  7ith-order  redundancy.  Typical  results  are:  (1) 
when  source  statistics  is  known,  the  7ith-order  redundancy  is 
0(n”1);  (2)  when  the  statistics  of  a  source  is  unknown,  the 
?ith-order  redundancy  grows  as  0(ln  n/n). 

In  lossy  source  coding,  the  OPTA  of  a  memoryless  source 
p  is  its  rate  distortion  function  R(p ,  d)  when  the  memory¬ 
less  source  p  is  encoded  by  block  codes  at  fixed  distortion 
level  d,  i.e.,  d-semifaithful  codes,  and  is  its  distortion  rate 
function  d(p,  R)  when  p  is  encoded  by  block  codes  at  fixed 
rate  level  R.  If  R(p,  d)(d(p,  R),  resp.  )  is  the  OPTA,  then 
the  corresponding  nth-order  redundancy  shall  be  refered  to 
as  the  nth-order  rate(distortion,  resp.  )  redundancy.  Un¬ 
like  the  case  of  lossless  coding,  in  lossy  source  coding  only  a 
few  research  works  on  redundancy  have  been  done.  Specif¬ 
ically,  Pilc[l]  considered  for  the  first  time  the  problem  of 
7ith-order  distortion  redundancy.  For  a  memoryless  source  p 
with  finite  source  and  reproduction  alphabets,  he  proved  that 
the  n-th  order  distortion  redundancy  of  p  is  upper  bounded 
by  (—(1  4-  c)-£fid(p}  R)\nn/2n)(l  +  o(l))  and  argued  that 
the  n-th  order  distortion  redundancy  is  lower  bounded  by 
(  —  -£frd(p,  R)  In  n/27i)(l  +  o(l)),  where  ^—d(p,  R)  is  the  deriva¬ 
tive  of  d(p,  R)  with  respect  to  R .  Recently,  Yu  and  Speed [2] 
proved  that  for  memoryless  sources  with  finite  source  and  re¬ 
production  alphabets,  the  «-th  order  universal  rate  redun¬ 
dancy  is  upper  bounded  by  (K  J  4  J  +  4)  In  n/n  4  o(n_1)  and 
conjectured  that  0(ln7i/?i)  is  the  optimal  rate  at  which  the 
n-th  order  rate  redundancy  converges  to  0  as  71  — ►  00,  where  J 
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and  K  are  the  sizes  of  the  source  alphabet  and  the  reproduc¬ 
tion  alphabet,  respectively.  Linder,  Lugosi  and  Zeger  recently 
considered  the  case  of  real  alphabet  and  proved  that  for  mem¬ 
oryless  sources,  the  n-th  order  universal  distortion  redundancy 
is  upper  bounded  by  0(x/ln  n/n).  Unfortunately,  Pile’s  ar¬ 
gument  to  his  lower  bound  is  heavily  based  upon  the  unjus¬ 
tified  assumption  that  the  output  of  any  block  code  can  be 
approximated  by  an  independent  and  identically  distributed 
random  vector.  Whether  or  not  the  Pile’s  lower  bound  is  true 
is  a  question  left  open  for  more  25  years.  Before  our  work, 
therefore,  nontrivial  lower  bounds  are  still  unknown  to  either 
the  n-th  order  rate  redundancy  or  the  71-th  order  distortion 
redundancy. 

The  aim  of  this  paper  is  to  answer  the  above  open  ques¬ 
tions.  We  derive  a  closed  formula  for  the  nth-order  distortion 
redundancy  and  prove  that  the  7ith-order  rate  redundancy  is 
upper  bounded  by  (In  n)/n  4  0 ( (In  n)/n)  and  lower  bounded 
by  (In  n)/2n  4  o((ln  n)/n).  As  by-products,  these  results  give 
positive  answers  to  both  the  Pile’s  open  problem  and  the  re¬ 
cent  Yu-Speed’s  conjecture. 

II.  Statement  of  Main  Results 

Let  {X,}J°  be  an  I.I.D  source  taking  values  in  a  source  al¬ 
phabet  A  and  having  a  generic  distribution  Let  B  be  our 
reproduction  alphabet.  Denote  by  J  and  K  the  cardinalities 
of  A  and  B,  resp.  .  Let  p  :  A  X  B  — ►  [0,  00)  be  a  single  letter 
distortion  measure.  Denote  by  R(p ,  d)(d(p,  R),  resp.  )  the  rate 
distortion(distortion  rate,  resp.  )  function  of  p  with  respect 
to  the  fidelity  criterion  generated  by  p.  If  Cn  C  Bn  is  a  block 
code  of  order  n  with  |C„|  <  enK(in  this  paper,  coding  rates 
are  measured  in  nats),  the  distortion  redundancy  Dn( Cn)  of 
C„  is  defined  as  pn(Cn)  —  d(p,  R),  where  pn(Cn)  is  the  av¬ 
erage  distortion  resulting  from  the  encoding  of  {X*}  by  C„. 
The  ?ith-order  redundancy  T>n(R)  is  the  minimum  number  of 
£bi(C„)  over  all  block  codes  C„  of  order  n  with  |Cn|  <  euR. 
For  d- semifaithful  codes  of  order  n,  we  can  similarly  define 
the  nth-order  rate  redundancy  7 Zn(d).  The  following  two  the¬ 
orems  give  the  asymptotics  of  Vn(R)  and  7 Zn(d). 

Theorem  1  Let  R  >  0.  For  sufficiently  large  n,  we  have 

d  t  .  Inn  /In  n\ 

£>.(«)  =  -5s%,«)1T+»(—)  . 

Theorem  2  Assume  R(p,  d)  >  0.  Then  for  sufficiently  large 

71, 

Inn  /In  n\  In  n  /In  n\ 

—  +  0  -  <  7 Zn(d)  <  -  4o  -  . 

277.  V  n  /  n  \  n  / 

During  the  process  of  proving  Theorems  1  and  2,  we  develop 
a  deep  theory  on  types  and  d-ball  covering,  which  is  also  very 
interesting  on  its  own. 
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Abstract  —  We  derive  some  information  theo¬ 
retic  inequalities  to  evaluate  channel  capacity  and 
mean  square  error.  We  prove  an  inequality  for  the 
capacity  of  an  additive  noise  channel  with  feed¬ 
back.  We  also  prove  an  inequality  for  mutual  in¬ 
formation  and  mean  square  error.  The  inequality 
is  applied  to  bound  minimum  mean  square  trans¬ 
mission  errors. 

Summary 

We  first  study  the  capacity  of  an  additive  noise  chan¬ 
nel  with  feedback.  In  general,  especially  in  non-Gaussian 
cases,  it  is  a  hard  task  to  calculate  the  capacity  exactly. 
So  it  is  important  to  give  effective  lower  or  upper  bounds 
on  the  capacity.  Let  (  be  a  stochastic  process  repre¬ 
senting  an  additive  noise.  We  employ  the  notation  £* 
to  denote  a  Gaussian  process  with  the  same  mean  and 
covariance  functions  as  the  process  Corresponding  to 
the  channel  with  additive  noise  £,  we  consider  a  Gaussian 
channel  with  additive  noise  £*. 

Theorem  1  Assume  that  the  channels  are  with  feed¬ 
back.  Then,  under  an  average  power  constraint,  the  ca¬ 
pacity  C  of  the  channel  with  additive  noise  £  is  bounded 

by 

C*  <  C  <  C*  +  H((\\n,  (1) 

where  C*  is  the  capacity  of  the  corresponding  Gaussian 
channel  and  U(£||£*)  is  the  relative  entropy  (or  informa¬ 
tion  divergence)  of  £  with  respect  to  £*. 

In  the  case  where  the  channels  are  without  feedback, 

(l)  has  been  obtained  in  [2]. 

It  is  interesting  to  recall  the  duality  between  the  result 
(l)  on  the  channel  capacity  and  a  result  due  to  Binia  et 
al.  [l]  on  the  rate  distortion  function.  Denote  by  R(D\  £) 
the  rate  distortion  function  of  a  stochastic  process  £  with 
mean  square  distortion.  Then  it  is  known  that 

R(D;C)  ~  H{t HO  <  R(D-, 0  <  R(D-,( •),  D  >  0, 

or  equivalently 

D[R  +  H((\\n ;  n  <  D(R ;  0  <  D(R ;  (*),  R>  0, 

where  D(R;£)  is  the  distortion  rate  function  of  £. 

The  second  result  relates  the  mean  square  error  to  the 
mutual  information.  We  denote  by  d(£,??)2  the  mean 


square  error  between  stochastic  processes  (or  random 
variables)  £  and  77. 

Theorem  2  The  mean  square  error  is  lower  bounded  by 

d(tr,)>Dm,r,)  +  Hm*)-,n  (2) 

where  /(£,  77)  is  the  mutual  information  between  £  and  77. 

The  results  (1)  on  the  channel  capacity  C  and  (2)  on 
the  mean  square  error  d(£,77)2  are  expressed  in  terms 
of  the  capacity  C*  of  the  related  Gaussian  channel,  the 
distortion  rate  function  D(  • ;  £*)  of  the  related  Gaussian 
process,  the  mutual  information,  and  the  relative  entropy. 
Results  on  the  capacity  of  Gaussian  channels  are  available 
in  the  literature  [3]  (and  references  therein).  The  rate 
distortion  function  /?(/);£*)  of  the  Gaussian  process  £* 
is  given  in  a  closed  form  of  D,  and  D(R\  £*)  is  the  inverse 
function  of  R(D;(*).  The  relative  entropy  £7(£||£*)  may 
be  regarded  as  the  non-Gaussianness  of  £. 

The  inequality  (2)  is  useful  to  evaluate  the  reproduc¬ 
tion  error  in  information  transmission  over  a  channel. 

Theorem  3  Let  a  stochastic  process  £  be  a  message  to 
be  transmitted  over  a  channel  of  capacity  C.  Then  the 
minimum  mean  square  transmission  error  A(£)2  over  the 
channel  is  bounded  by 

Att)>D[C  +  H(t\\C)-,n 

If  a  message  £  is  a  random  variable  with  variance  <j2,  then 

A(0>^exphC  +  £r(€||r  Jin- 
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Abstract  —  It  is  shown  by  developing  the  previous 
technique  that  the  critical  distortion  dc  of  the  q- 
ary  Potts  models  on  a  number  of  lattices  is  related 
to  the  radius  of  convergence  R  of  the  Mayer's  series 
by  dc=(q-l)R/(l+R).  A  recursive  approach  is  applied  to 
estimate  R  as  well  as  dc  by  using  the  matrix 
representation  of  Mayer's  series.  For  those  Potts  models 
of  which  the  Mayer's  series  are  not  available,  we 
derive  an  unified  form  of  lower  bound  for  dc. 

SUMMARY 

A  q-ary  Potts  model  on  is  a  random  field  X=  {  Xi, 
i6Z*}  with  the  following  Gibbs  distribution 

*  =  ^exp(2  (Xi,xj)}  (1) 

where 

t  ,  r  0  if  Xi=Xj 

6  (Xi,Xj)=  J 

L  1  if  Xi^Xj 
xieQ={o,i,2,”',q-i} 

and  the  summation  is  taking  over  all  the  nearest 
neighboring  pairs  of  sites  on  lattice  Z  ^  .  The  Ising 
model  is  recovered  when  q=2. 

The  per-site  e  -entropy  for  X'V<:'*>={Xi,i^  V00  }  on  an 

finite  subset  ={i=(i-L,’",iic),|ij|^n}  d  Z  v  ,is  defined 

by 

Rxv<*o  (d)=inf - I(XX  Y*  <*»>) 

I  Vl 

where  the  inf  is  over  all  random  fields  Y**  ^  such 
that 

iP  (Xi.YiXd 

where  P  (,,.)  is  Hamming  distance  on  A*A.Then  the  per- 
site  &  -entropy  for  X  is  defined  by 

R,(d)=n1imcoRx-(d) 

if  the  limit  exists. 

Bassalygo  and  Dobrushin[  1]  proved  for  a  wide  class 
of  q-ary  random  fields  on  Z  **  that  for  sufficient 
small  d: 

H«(d)-H_(XH>  (d)  (2) 

where  HOT(X)  is  the  entropy  rate  of  the  random  field  X, 
and  cp  (d)=-dlogd-(l-d)log(l-d)+dlog(q-l). 

The  critical  distortion  dc  is  defined  by 
dc=sup{d:R^(d)=H_(X)-<p  (d)}  (3) 

They  proved  the  existence  of  positive  dc  using  cluster 
expansion  method,  but  didn't  provide  any  estimation  or 


bounds  of  d0. 

In  this  work  we  show  that  for  random  field  in  (1), 
dcis  related  to  the  radius  of  convergence  R  of  the 
Mayer's  series  associated  with  the  Potts  model  by: 

dc=(q-l)-j-£R  (4) 

We  have  provided  a  recursive  approach  [2]  to  compute 
R  as  well  as  dc  by  the  matrix  rerepresentation  of 
Mayer's  series.  In  particular,  we  have  applied  this 
method  to  calculate  the  dc  for  Ising  models  defined  on 
several  2  or  3-dimensional  even  lattices. Let  N  denotes 
the  number  of  the  nearest  neighboring  sites  of  each 
site  on  the  lattice.We  found  that  dc  decreases  as  N 
or  the  dimension  of  lattices  increase. 

In  the  case  that  the  series  expansions  are  not 
available  we  can  bound  dc  using  Ruelle's  Theorem  [  3] 
from  statistical  mechanics.  We  derive  the  following 
lower  bound  for  dc; 


dc> 


{ 


(q-l)bf1' 

1+b?* 

(q-l)bik 

1+biF 


if  J<0 


if  J>0 


where 

q-l-[(q-l)(q-l+e^)(l-g»)]Va 
1  (q-l)(q-2+e*) 

k  _(q-2+eJ)-[(q-l+eJ)(eJ-l)]l/2 

(q-2)(q-l+eJ)+l 

When  q=2,k=l,this  bound  coincides  with  the  exact  value 

of  Gray's  critical  distortion. 
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Abstract  —  Using  a  codebook  C,  a  source  sequence  is 
described  by  the  codeword  that  is  closest  to  it  accord¬ 
ing  to  the  distortion  measure  do(x,xo).  Based  on  this 
description,  the  source  sequence  is  reconstructed  to 
minimize  the  distortion  measured  by  di(x,x\)^  where 
in  general  d\(x,xi)  ^  do(x,xo )•  We  study  the  mini¬ 
mum  resulting  d\  (x,  £i)-distortion  between  the  recon¬ 
structed  sequence  and  the  source  sequence  as  we  op¬ 
timize  over  the  codebook  subject  to  a  rate  constraint. 
Using  a  random  coding  argument  we  derive  an  up¬ 
per  bound  on  the  resulting  distortion.  Applying  this 
bound  to  blocks  of  source  symbols  we  construct  a  se¬ 
quence  of  bounds  which  are  shown  to  converge  to  the 
least  distortion  achievable  in  this  setup.  This  solves 
the  rate  distortion  dual  of  an  open  problem  related  to 
the  capacity  of  channels  with  a  given  decoding  rule — - 
the  mismatch  capacity. 

Addressing  a  different  kind  of  mismatch,  we  also 
study  the  mean  squared  error  description  of  non- 
Gaussian  sources  with  Gaussian  codebooks.  It  is 
shown  that  the  use  of  a  Gaussian  codebook  to  com¬ 
press  any  ergo  die  source  results  in  an  average  distor¬ 
tion  which  depends  on  the  source  via  its  second  mo¬ 
ment  only.  The  source  with  a  given  second  moment 
that  is  most  difficult  to  describe  is  the  memoryless 
zero-mean  Gaussian  source,  and  it  is  best  described 
using  a  Gaussian  codebook.  Once  a  Gaussian  code¬ 
book  is  used,  we  show  that  all  sources  of  a  given  sec¬ 
ond  moment  become  equally  hard  to  describe. 

I.  Mismatched  Description 
The  design  and  implementation  of  lossy  block  source  compres¬ 
sion  is  usually  done  in  three  steps.  The  first  step  is  to  find  a 
single-letter  distortion  measure  that  best  describes  the  needs 
and  sensitivities  of  the  end-user  (reconstructor).  Based  on 
this  distortion  measure  and  on  the  probability  law  that  gov¬ 
erns  the  source  behavior,  a  codebook  is  designed  to  minimize 
the  average  distortion  subject  to  some  rate  and  complexity 
constraints.  Finally  the  source  output  sequence  is  described 
by  the  index  of  the  closest  codeword  to  the  source  sequence 
according  to  the  distortion  measure.  The  end-user  then  recon¬ 
structs  the  source  sequence  based  on  the  index,  the  codebook 
and  the  distortion  measure. 

Our  interest  is  in  a  situation  where  the  distortion  measure 
di(X)Xi)  that  best  describes  the  sensitivities  of  the  end-user 
is  different  from  do(x,xo)  according  to  which  the  source  is 
encoded.  Such  a  situation  can  arise  if  encoding  according 
to  do(x,xo)  is  easier  to  implement  than  encoding  to  mini¬ 
mize  d\(xyxi),  or  when  one  attempts  to  reconstruct  a  source 
that  was  compressed  using  a  standard  lossy  compression  algo¬ 
rithm  over  which  one  has  no  control.  The  codebook  and  the 
two  distortion  measures  are  known  to  the  end-user.  Only  the 
codeword  nearest  to  the  source  sequence,  not  the  source  se¬ 
quence  itself,  is  available  to  him,  and  he  needs  to  reconstruct 
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the  source  sequence  to  minimize  the  d\  (x,  rri)-distortion.  A 
formal  statement  of  the  problem  follows. 

A  blocklength  n  code  C  of  size  2nR  over  a  finite  alphabet  Xo 
is  used  to  encode  a  memory  less  source  of  law  p(x)  that  takes 
value  in  a  finite  alphabet  X .  A  source  sequence  x  is  described 
by  the  codeword  xo(«)  that  is  nearest  to  x  according  to  the 
single-letter  bounded  distortion  function  do(xixo).  Based  on 
the  description  xo(«)  and  the  knowledge  of  the  codebook  C, 
we  wish  to  reconstruct  the  source  sequence  to  minimize  the 
average  distortion  defined  by  the  bounded  distortion  function 
di(x,xi),  where  in  general  di(x,xi )  ^  do(£,£o)-  In  fact,  the 
reconstruction  alphabets  Xo  and  X\  may  well  be  different. 
We  study  the  minimum,  over  all  codebooks  C  of  rate  R ,  of  the 
average  distortion  between  the  reconstructed  sequence  xi(i) 
and  the  source  sequence  x.  This  quantity  is  denoted  by  D\  ( R ). 

Using  a  random  coding  argument,  an  upper  bound  on 
D\{R)  is  derived.  We  show  that  this  bound  is  in  general  not 
tight,  and  derive  a  monotonic  sequence  of  upper  bounds  which 
converges  to  D\(R).  This  solves  the  rate  distortion  dual  of  an 
open  problem  related  to  the  capacity  of  channels  with  a  given 
decoding  metric  [1]. 

II.  A  Rate  Distortion  Saddlepoint 
Here  we  focus  on  a  different  kind  of  mismatch — one  where 
the  source  distribution  is  not  the  one  for  which  the  codebook 
was  optimized.  We  consider  real-valued  ergodic  sources  and 
the  mean  squared  error  distortion  measure.  We  study  that 
performance  that  one  can  expect  when  one  describes  such  a 
source  using  a  “Gaussian  codebook” ,  where  a  Gaussian  code¬ 
book  is  a  random  codebook  whose  codewords  are  drawn  in¬ 
dependently  and  uniformly  over  an  n-dimensional  Euclidean 
sphere.  Using  a  result  due  to  Wyner  [2]  we  show  the  following. 
Theorem  1  Consider  the  ensemble  of  codebooks  generated  by 
drawing  2nR  codewords  independently  and  uniformly  over  the 
n-dimensional  sphere  of  radius  rn  centered  about  the  origin. 
Let  x  be  an  n-length  source  sequence  generated  by  an  ergodic 
source  of  second  moment  cr2,  and  let  0  <  D  <  a2 . 

(a)  If  R  <  ~  log(cr2/D)  then  irrespective  of  the  radii 

Pr  (3x  G  C  s.t.  1 1 x  -  x||2  <  nD)  0. 

(b)  If  R>  ~  log(<r 2 /D)  and  rn  —  \/n(a2  —  D)  then 

Pr  (3x  G  C  s.t.  ||x  —  x||2  <  nD)  n 1. 
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Abstract  —  The  Shannon  lower  bound  on  the  rate 
distortion  function  of  sub— Gaussian  vectors  is  consid¬ 
ered.  It  can  be  shown  that  the  Shannon  lower  bound 
can  be  decomposed  into  a  sum  of  the  rate  distortion 
function  of  a  corresponding  Gaussian  vector  and  a 
correction  term  which  accounts  for  the  differing  dis¬ 
tribution  shape.  This  correction  term  is  numerically 
evaluated. 

I.  Introduction 

Sub-Gaussian  random  vectors  can  serve  as  source  models 
for  speech  samples,  coefficients  in  image  transform  or  subband 
coding  or  as  model  for  displaced  frame  differences  as  they  oc¬ 
cur  in  hybrid  video  coding.  This  is  due  to  mainly  two  aspects. 
First,  sub-Gaussian  vectors  show  elliptically  shaped  contours 
of  equal  distribution,  thus  belonging  to  the  class  of  spherically 
invariant  random  vectors ;  second,  the  univariate  distribution 
shape  is  peaky  and  “thick-tailed”  compared  to  the  Gaussian 
distribution.  Both,  spherical  invariance  and  peaky  distribu¬ 
tion  fit  well  to  the  actual  statistics  of  a  wide  variety  of  sources. 

Leung  and  Cambanis  [1]  gave  the  Shannon  lower  bounds 
of  spherically  invariant  random  processes  and  vectors.  Except 
for  using  the  squared  error  distortion  criterion  their  work  was 
very  general.  In  this  contribution  the  Shannon  lower  bounds 
of  sub-Gaussian  random  vectors  will  be  evaluated.  In  order  to 
keep  the  average  distortion  finite,  the  absolute  error  criterion 
is  employed  using  results  from  [2]. 

II.  Sub-Gaussian  random  vectors 

Let  X  be  a  random  vector  with  pdf  /(x).  The  (multi¬ 
variate)  characteristic  function  (cf)  of  X  is  then  defined  by 
$(t)  =  EeJt  x,  where  t  denotes  a  vector  of  same  dimension 
as  X  and  E  denotes  expectation. 

A  spherically  invariant  random  vector  (SIRV)  is  a  random 
vector  defined  by  the  property  that  its  characteristic  function 
(cf)  can  always  be  written  as 

4>(t)  =  h(u)  with  u  =  tTCt,  (1) 

where  C  is  a  positive  definite  matrix.  Note,  that  Gaussian 
random  vectors  are  included  here  as  a  special  case  with  h(u)  = 
exp(— u/2)  and  C  being  the  covariance  matrix  of  the  vector. 

Spherically  invariant  vectors  are  completely  specified  by  the 
univariate  marginal  density  function  and  the  linear  statistical 
dependencies  (expressed  in  terms  of  C)  between  the  compo¬ 
nents. 

A  random  vector  X  is  called  sub-Gaussian,  if  and  only  if 
its  characteristic  function  is  given  by 

$x(t)  =  exp  -  (trct)“/2j  ,  (2) 

where  C  is  a  positive  definite  matrix  and  1  <  a  <  2  [3]. 

Note,  that  sub-Gaussian  random  vectors  are  SIRVs  [4]  and 
that  zero  mean  Gaussian  random  vectors  are  included  in  the 
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case  a  =  2.  Compared  to  Gaussian  random  vectors  of  same 
dimension,  sub-Gaussian  vectors  are  parameterized  with  only 
one  additional  parameter  a  which  accounts  for  different  dis¬ 
tribution  shapes. 

III.  Evaluation  of  Shannon  lower  bounds 

Following  [1] [2] ,  the  Shannon  lower  bound  R^l(D)  of  a 
sub-Gaussian  random  vector  with  cf  (2),  decomposes  into  a 
sum  of  the  Shannon  lower  bound  R^l  G(D)  of  a  Gaussian 
vector  with  the  same  matrix  C  and  a  correction  term  K (n,  a) 
depending  only  on  the  vector  dimension  n  and  the  parameter 
a.  Interestingly,  the  correction  term  does  neither  depend  on 
the  matrix  C  nor  on  the  distortion  D. 

Because  sub-Gaussian  random  vectors  are  SIRVs,  the 
bounds  can  be  determined  via  Hankel  transform  of  the  func¬ 
tion  exp(— \t\a)  (see  e.g.  [5]).  Values  for  the  correction  term 
in  case  of  a  =  1  are  given  in  the  table.  In  this  case  the 
sub-Gaussian  distribution  is  spherically  invariant  with  Cauchy 
marginals.  For  a  between  1  and  2  the  correction  term  falls  into 
the  range  between  zero  and  the  corresponding  value  in  the  ta¬ 
ble  and  can  be  determined  (at  least  in  principle)  numerically. 


n 

1 

2 

4 

8 

16 

32 

oo 

K(n,  1) 

1.10 

0.94 

0.79 

0.66 

0.57 

0.51 

0.39 

Tab.  1:  Correction  term  (in  bit/sample)  for  sub-Gaussian 
random  vector  of  dimension  n  with  a  —  1. 

Addition  of  the  Shannon  lower  bound  of  a  Gaussian  vector 
(which  is  well  known)  leads  then  to  the  Shannon  lower  bound 
of  a  sub-Gaussian  vector  for  any  parameters  C,  a  and  n. 
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Abstract  —  Traditional  subband  coding,  where  each 
subband  is  encoded  independently,  has  been  shown  by 
Fischer  to  be  suboptimal  in  the  rate-distortion  sense. 
We  show  that  if  we  use  prediction  across  subbands, 
the  resulting  coder  is  asymptotically  rate-distortion 
optimal  at  high  rate. 


I.  Introduction 

In  a  typical  subband  coding  system,  the  signal  is  first  decom¬ 
posed  into  subsignals,  then  a  bit  allocation  algorithm  is  used 
to  determine  the  rate  to  encode  each  subsignal,  and  finally 
each  subsignal  is  encoded  independent  of  the  others.  It  is 
shown  [1,  2]  that  for  Gaussian  sources  and  for  ideal  (brick- 
wall)  subband  filters,  subband  coding  can  achieve  at  high  rate 
a  coding  gain  over  PCM.  Recently,  Fischer  [3]  showed  that 
subband  coding  for  Gaussian  sources  with  QMF  filters  is  gen¬ 
erally  suboptimal  in  the  rate  distortion  sense. 

Theorem:  (Fischer  1992)  Consider  a  Gaussian  process  xn 
with  spectral  density  Sx(f ),  which  is  decomposed  by  a  QMF 
system  into  the  subsignals  sn  and  dn .  If  we  encode  sn  and  dn 
independently  of  each  other,  the  optimal  performance  satisfies 


<  2v/o-f7?<7d7d 


exp 


u: I 


log,  [A(/)  +  Sx(f)Sx(f  +  0.5)]  df 


where 

A(/)  =  \H(f)\2\H(f  +  0.5)|2  [Sx(/)  -  Sx(f  +  0.5)]2  . 

The  inequality  is  strict  if  A (/)  >  0  on  a  subset  of  [—0.25,  0.25] 
of  positive  measure.  I 

The  implication  of  the  inequality  is  that  the  performance 
of  subband  coding  at  high  rate  is  strictly  lower  bounded  by 
the  rate-distortion  function  of  the  source  except  for  several 
special  cases  where  A (/)  =  0,  e.g.,  when  the  filter  H(f)  is  ideal 
(hence  also  the  complimentary  filter  G(/)),  or  when  Sx(f)  is 
symmetric  about  /  =  1/4. 


II.  Subband  Coding  with  Crossband  Prediction 

Consider  the  subband  coder  with  crossband  prediction  as 
shown  in  Fig.  1.  We  first  encode  sn  to  get  sn-  We  then  use  a 
linear  predictor  to  generate  dn,  a.  predicted  version  of  dn,  from 
and  then  encode  the  prediction  error  en  =  dn  —  dn.  To  cal¬ 
culate  the  energy  of  the  prediction  error,  we  assume  that  sn  is 


Figure  1:  A  subband  coder  with  crossband  prediction. 


available  at  the  predictor  input.  This  obviously  cannot  be  the 
case  at  the  decoder,  and  hence  the  results  in  this  summary 
is  only  asymptotically  exact  at  high  rate.  The  mean  square 
prediction  error  achieved  using  an  optimum  linear  predictor  is 
(see,  for  example,  pages  432-435  of  [4]) 

E[e2n]  =  E[(dn  -  dn)2]  =  f  [Sd(f)  -  \Sds(f)\2Sf(-f)]  df, 

J- 0.5 


where 


«®(f)  = 


i /<*(/)  if  a(f)  > 0 

0  if  a(f)  =  0. 


It  is  clear  that  the  predictor  error  is  also  Gaussian.  If  we 
encode  the  components  sn  and  en  using  the  optimum  bit  al¬ 
location,  the  resulting  distortion  is 


D(R)  2R. 


We  can  then  prove  the  following  theorem: 

Theorem:  Let  xn  be  a  Gaussian  process  with  spectral  den¬ 
sity  Sx(f).  It  is  decomposed  using  a  two  band  QMF  system 
into  sn  and  dn ,  and  then  optimally  encoded  using  the  cross¬ 
band  predictive  coder.  The  equality 

=  crhl  (1) 

holds,  which  implies  that  the  subband  coding  system  with 
cross  band  prediction  is  asymptotically  optimal  in  the  rate 
distortion  sense  at  high  rates. 

Proof:  Proceed  with 


2  a/ 


r  /*0 .25 

=  exp  <  /  loge  (4Ss(2f)Sd{2f)  -  4|Sd5(2/)|2)  df 

1  J  —  0.25 


•(2) 


As  shown  in  [3], 

4Ss(2f)Sd(2f)  =  Sx(f)Sx(f  -j-  0.5)  +  A(/).  (3) 

Using  the  equality  G(f)  =  e~j2n*H(-f  -  0.5),  we  have 
MSds(2f)\2 

=  \e-^H(-f-0.$)H(-f)Sx(f) 

+e-,27r(/+0.5 )H(_f)ff(_f  _  o.5)5.(/  +  0.5)|2 

=  A(/).  (4) 

Substituting  (3)  and  (4)  into  (2),  we  get  the  desired  result.  | 


References 

[1]  J.  W.  Woods  and  S.  D.  O’Neil,  “Subband  coding  of  im¬ 
ages,”  IEEE  Trans.  Acoust.,  Speech,  Signal  Processing ,  vol.  34, 
pp.  1278-1288,  Oct.  1986. 

[2]  W.  A.  Pearlman,  “Performance  bounds  for  subband  coding,” 
in  Subband  Image  Coding  (J.  W.  Woods,  ed.),  ch.  1,  pp.  1-41, 
Norwell  MA:  Kluwer  Academic  Publishers,  1991. 

[3]  T.  R.  Fischer,  “On  the  rate- distort  ion  efficiency  of  subband  cod¬ 
ing,”  IEEE  Trans.  Info.  Theory ,  pp.  426—428,  Mar.  1992. 

[4]  A.  N.  Shiryayev,  Probability .  New  York:  Springer- Verlag,  1984. 


195 


OPERATIONAL  RATE  DISTORTION  THEORY 


Ilan  Sadeh 

CS.  Dept.  Ben  Gurion  University,  Beer  Sheva,  Israel,  sade@ivory.bgu.ac.il 


The  paper  treats  data  compression  from  the  viewpoint 
of  information  theory  where  a  certain  error  probabil¬ 
ity  is  tolerable.  We  obtain  bounds  for  the  minimal  rate 
given  an  error  probability  for  blockcoding  of  general  sta¬ 
tionary  ergodic  sources.  An  application  of  the  theory  of 
large  deviations  provides  numerical  methods  to  compute 
for  memoryless  sources,  the  minimal  compression  rate 
given  a  tolerable  error  probability.  Interesting  connec¬ 
tions  between  Cramer’s  functions  and  Shannon’s  theory 
for  lossy  coding  are  found. 

1.  The  deterministic  partition  approach 

Given  u  E  U  and  v  £  V  a  distortion-measure  is  any 
real  positive  function  d  :  [U  x  V]  -»  71+ .  Let  pi{u\v)  - 
denote  the  distortion  for  a  block-  the  average  of  the  per 
letter  distortions  for  the  letters  that  comprise  the  block. 
pi(u;v)  =  jY2li=id(ui]Vi).  Let  D  be  a  given  tolerable 
level  of  distortion  relative  to  the  memoryless  distortion 
measure  d(u,  v). 

The  set  of  all  possible  codewords  is  partitioned  into  two 
disjoint  subsets:  Codebook  and  its  complement  set.  The 
Codebook  contains  all  the  codewords  in  the  code.  Each 
sourceword  u  of  length  /  is  mapped  onto  exactly  one 
of  the  codewords  in  the  Codebook  provided  the  distor¬ 
tion  of  the  block  is  not  larger  than  ID.  Otherwise,  the 
sourceword  is  included  in  the  Error  set  and  a  coding 
failure  is  said  to  have  occurred. 

Definition  1:  AD  —  Ball  covering  of  a  codeword 
v)  denoted  T(u),  is  a  set  of  all  sourcewords  such  that 

T(w)  =  ju|p((u,i;)  <  £>j. 

That  is,  we  define  spheres  around  all  the  possible  code¬ 
words  v .  But  these  spheres  do  not  define  probabilities 
on  the  codewords.  Each  sourceword  should  be  mapped 
to  exactly  one  codeword.  Thus,  we  denote  the  set  of  the 
sourcewords  that  map  to  the  codeword  v  after  a  parti¬ 
tion,  as  A(v).  Consequently  the  induced  /-order  entropy 
is,  Hv(l)  =  —  jDlog Pr(v). 

Definition  2:  An  acceptable  partition  of  block- 
length  /  is  a  partition  on  the  space  of  /  length  source- 
words  such  that  for  all  v ,  the  associated  subset  A(t?) 
satisfies  A(u)  C  T(v)  and  that  lim/^oo  Hv(l)  exists. 

Definition  3:  The  set  D  —  Ball(u)  is  defined  as, 
D  -  Ball(u )  =  lv\pi(u,  v)  <  d\. 


Lossy  AEP  Theorem: 

For  any  acceptable  partition  of  blocklength  /  and  given 
any  8  >  0,  the  set  of  all  possible  sourcewords  of  block- 
length  /  produced  by  the  source  can  be  partitioned  into 
two  sets,  Error  and  Error0,  for  which  the  following 
statements  hold: 

1.  Assuming  a  stationary  and  ergodic  output  process, 
the  probability  of  a  sourceword  belonging  to  Error ,  van¬ 
ishes  as  l  tends  to  infinity. 

2.  If  a  sourceword  u  is  in  Error c  then  its  associated 
codeword  v  is  in  the  Codebook  and  its  probability  of 
occurrence  is  more  than  e-KHv(0+t) . 

3.  The  number  of  codewords  in  the  Codebook  is  at  most 
el(Hv(l)+6) 

Given  is  a  stationary  ergodic  source  u  with  known 
probabilities  for  all  blocklengths  /,  an  acceptable  av¬ 
erage  distortion  D  and  a  tolerable  error  probability 
E e  •  Assuming  the  /  order  entropy  induced  by  the 
chosen  acceptable  partition  is  then  the  opti¬ 

mal  code  set  is,  T/(D,  8)  =  |u|Pr(£)  >  e-*(*L(0+5)  | 

where  a  value  <5  is  determined  by  the  error  proba¬ 
bility.  The  error  set  is  defined  by,  Errori(8 ,  D)  = 

|u|  mine:p(a  ij)<£)  —  jlogPr(v)  —  Hv(l)  >  sj. 

2.  Bounds  on  Memoryless  Sources. 

For  a  given  Pe,  a  bound  on  the  average  distortion  level  D 
and  a  blocklength  /,  we  find  the  best  compression  rate. 
The  results,  developed  for  memoryless  source  might  be 
generalized  for  classes  of  sources  for  which  there  is  a 
well-developed  body  of  large  deviations  results  for  the 
source  output  process. 

Our  approach  to  the  problem  is  based  on  transforma¬ 
tion  of  the  deterministic  problem  to  a  stochastic  one 
and  calculation  of  the  error  probability  and  the  rate, 
using  large  deviations  theory.  Optimizing  over  all  pos¬ 
sible  transitions  matrices  for  a  given  error  probability 
provides  the  solution.  The  loss  of  Wq(6)  amount  of  in¬ 
formation  in  the  transmission  results  in  the  compression 
by  gaining  ^q(8)  nats.  The  term  6  is  determined  by  the 
tolerable  error  probability.  We  obtain  a  ’’conservation 
law” ,  where  the  amount  of  the  lost  information  is  equal 
to  the  gain  in  the  compression,  only  in  the  lossless  case. 
It  is  an  interesting  interpretation  for  the  two  Cramer’s 
functions  in  context  of  lossy  data  compression. 
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Abstract  —  We  apply  the  relative  entropy  functional 
to  sets  of  Line-Spectrum  Pairs  (LSPs)  and  transform- 
based  generalized  spectral  pmfs  of  [l]  and  present  ex¬ 
perimental  results  for  sequence  segmentation  and  vec¬ 
tor  quantization  which  show  that  the  relative  entropy 
of  these  quantities  is  a  useful  indicator  for  variable- 
rate  speech  coding. 

I.  Activity  Measures 

Numerous  methods  for  evaluating  spectral  differences  (or 
distortion)  have  been  explored  in  the  literature.  Many  of 
the  popular  techniques  pertain  to  optimal  one-step-ahead  lin¬ 
ear  prediction,  or  LPC  models  in  speech  processing  systems. 
These  approaches  have  been  used  to  minimize  distortion  in 
vector  quantizer  (VQ)  design  for  fixed-rate  coding  and  for  per¬ 
formance  evaluation  of  different  coding  systems. 

Recently,  spectral  entropy  has  been  proposed  as  a  different 
indicator  of  spectral  information  content  and  coefficient  rate 
[1].  Here,  we  combine  previous  results  which  use  subband 
spectral  flatness  measures  for  time-domain  speech  segmenta¬ 
tion  [2,  3]  with  a  different  application  of  the  concept  of  spectral 
distance.  This  approach  produces  encoding  cues  which  allow 
for  the  efficient  allocation  of  rate  in  both  the  time  and  fre¬ 
quency  domains. 

The  information-theoretic  functional  relative  entropy  is  a 
convenient  indicator  of  distance,  since  it  produces  a  measure 
of  the  difference  between  a  target  distribution  and  a  source 
distribution.  The  usual  entropy  functional  is  a  special  case 
of  relative  entropy  where  the  source  distribution  is  assumed 
to  be  uniform,  and  this  case  is  of  particular  interest  in  wave¬ 
form  segmentation  [1-3].  Thus,  the  use  of  relative  entropy  on 
appropriately  normalized  spectral  data  can  be  helpful  in  de¬ 
scribing  the  flatness  of  the  spectrum  with  respect  to  an  average 
energy  level,  or  in  determining  the  evolution  of  nonstationary 
spectral  representations. 

For  example,  by  dividing  the  normalized  spectrum  into  up¬ 
per  and  lower  halfbands  and  applying  the  entropy  functional 
to  each  halfband,  we  can  derive  an  instantaneous  indicator  of 
useful  bandwidth.  This  technique  can  be  applied  recursively 
to  further  refine  the  estimate.  These  halfband  indications  can 
be  used  to  reduce  encoding  rate  in  the  context  of  a  scalar 
coder  [3]  by  dynamically  changing  the  sampling  rate  of  the 
signal  and  in  the  context  of  a  vector  coder  [4]  by  changing  the 
allocation  of  rate  for  spectral  VQ. 

II.  Line  Spectral  Entropy 

The  Line  Spectral  Frequencies  (LSF)  or  Line  Spectrum 
Pairs  (LSP)  introduced  by  Itakura  are  an  alternative  LPC 
spectral  representation  with  several  convenient  properties  (or¬ 
dering/interlacing,  independence,  dynamic  range)  which  have 
been  examined  closely  in  the  context  of  LPC  quantization.  As 
a  result  of  these  properties,  an  LSP  vector  can  be  interpreted 
as  a  generalized  pmf  of  vocal  tract  resonances,  and  so  applica¬ 
tion  of  the  entropy  functional  produces  intuitive  results.  High 


values  of  the  “line-spectral  entropy”  indicate  a  flat  spectrum, 
and  low  values  indicate  a  textured  spectrum. 

The  relative  entropy  between  two  pmfs  is  defined  by 

£(p||?)  =  ^p(*)log^fy-  (!) 

x 

Considering  the  pmfs  in  (1)  to  be  derived  from  LSPs  (as  gen¬ 
eralized  pmfs),  the  relative  entropy  can  provide  some  indica¬ 
tion  of  the  similarity  between  two  spectral  envelopes.  This 
leads  to  some  interesting  interpretations  for  the  selection  of 
optimal  paths  to  minimize  distortion  and  detection  of  change- 
points  in  speech  waveforms.  In  this  case,  the  relative  entropy 
provides  a  measure  of  stationarity  for  the  AR  process  esti¬ 
mates  which  have  been  derived  from  local  segments  of  speech 
data.  Since  D(p\\q)  is  minimized  by  q  a  p,  a  small  value  of 
D(pi\\pi-i)  (where  the  subscript  indicates  a  frame  time)  indi¬ 
cates  a  slowly  varying  spectrum  whereas  large  values  indicate 
a  rapidly  changing  spectral  envelope.  This  measure  can  be 
applied  to  any  subset  of  elements  of  the  LSPs  to  determine 
the  rate  of  evolution  of  that  group  of  resonances.  Also,  if  we 
assume  for  each  i  that  the  spectrum  of  the  current  (ith)  frame 
has  evolved  in  one  frame  time  from  complete  whiteness, 

D(p,\\pi-i)  =  \ogm- H{pi)  (2) 

since  pi-i  is  the  uniform  distribution  of  m  resonances.  So,  the 
line-spectral  entropy  can  be  seen  as  a  particular  interpretation 
of  relative  entropy  which  measures  the  spectral  evolution  with 
respect  to  whiteness  at  each  frame  time. 
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Abstract  —  Characterization  of  the  achievable  dis¬ 
tortion  region,  when  correlated  information  sources 
are  transmitted  via  a  multiple-access  channel,  is  stud¬ 
ied.  An  inner  bound  for  the  set  of  achievable  distor¬ 
tions  is  obtained  and  it  is  shown  that  certain  known 
results  in  multi-terminal  source  and  channel  coding 
can  be  considered  as  special  cases  of  this  result. 

I.  Introduction 

Transmission  of  arbitrarily  correlated  information  sources  over 
a  multiple- access  channel  was  first  addressed  in  [l],  where  suf¬ 
ficient  conditions  for  reliable  transmission  were  derived.  How¬ 
ever,  determining  the  necessary  conditions  for  this  commu¬ 
nication  model  still  remains  an  open  problem.  In  particular 
it  has  been  shown  that  the  conditions  derived  in  [1]  are  not 
in  general  necessary  conditions.  Nevertheless,  so  far  no  other 
conditions  that  are  more  general  than  those  in  [l]  are  known. 

Coding  of  correlated  information  sources  subject  to  a  fi¬ 
delity  criterion  has  been  considered  in  a  number  of  works  in¬ 
cluding  [2],  [3],  and  [4].  This  problem,  in  general,  also  remains 
an  open  problem  and  characterization  of  the  rate-distortion 
region  for  this  case  is  not  yet  known  except  for  some  special 
cases. 

In  this  work  we  consider  a  communication  model  in  which 
two  correlated  information  sources  are  to  be  transmitted  via  a 
multiple-access  channel  and  to  be  reproduced  at  the  receiver 
subject  to  two  distortion  measures.  We  derive  a  set  of  achiev¬ 
able  distortions  for  this  source-channel  configuration  and  show 
that  many  of  the  previously  known  results  on  transmission  of 
correlated  information  sources  via  a  multiple- access  channel 
and  rate-distortion  region  for  correlated  information  sources 
can  be  considered  as  special  cases  of  this  result. 

II.  The  Communication  Model 
Two  discrete  memoryless  correlated  information  sources 
{{Sk,Tk)}  are  modeled  by  independent  drawings  of  two  ran¬ 
dom  variables  S  and  T  which  are  distributed  according  to 
p*(s,t).  The  corresponding  alphabets  are  denoted  by  S  and 
T.  A  discrete  memoryless  multiple-access  channel  with  two 
transmitters  and  one  receiver  is  described  in  terms  of  its  in¬ 
put  alphabets  X\  and  T2,  the  output  alphabet  y,  and  the 
conditional  probability  mass  function  p*(y\x\ix2). 

Sources  S  and  T  are  connected  to  the  first  and  the  sec¬ 
ond  transmitters  respectively.  It  is  assumed  that  for  each 
(St  T)  pair  generated  by  the  sources,  one  (XltX2)  pair  can  be 
transmitted  over  the  channel.  At  the  receiver  the  decoder  esti¬ 
mates  {(5fc, Tk)}  as  the  source  outputs,  where  (Sk,tk)  G  5xT 
and  S  and  T  denote  the  reproduction  alphabets  for  the  two 
sources.  Two  distortion  functions  di  :  S  x  S  — ►  i?+  and 
d,2  \T  x  T  — ►  R+  represent  the  corresponding  fidelity  criteria. 

2This  work  was  partially  supported  by  the  NSF  Grant  NCR- 
9101560. 


A  distortion  pair  (Di,D2)  is  achievable  if  for  any  S\  >  0 
and  82  >  0  there  exist  an  integer  n ,  two  encoding  functions 
fi  :  S  — ►  X{  and  /2  :  Tn  — +  X™,  and  one  decoding  function 
9  •  yn  — ►  Sn  x  Tn  such  that 

n 

~  ^  ^  E\d\  (Sk ,  -S’/c )]  <  D\  4-  £1 

k—  1 
1  n 

n  ^  ^  E[d,2  (Tk ,  Tk )]  <  D2  +  82 

kzzl 

Let  V  C  R+2  denote  the  set  of  all  achievable  distortion  pairs 

(DuD2). 

.  III.  Main  Result 

Our  main  result  is  the  derivation  of  an  inner  bound  for  the 
set  V  as  stated  in  the  following  theorem. 

Theorem:  If  there  exist 

1.  Auxiliary  random  variables  U  and  V  taking  values  in 
finite  sets  U  and  V  such  that  U  —  S  — ►  T  -»•  V  make  a 
Markov  chain. 

2.  Functions  hi  :  U  x  V  — ►  S  and  h2  :  U  x  V  — ►  T. 
such  that 

/(S;f7|F)  <  I{XnY\X2tV) 

I(T\  V\U)  <  I(X2;Y\XuU) 

I(SjT]  U,  V)  <  I(XuX2;Y) 

for  some 

p(s,t,u,v,xux2,y )  =  p*(s,<)p(m|s)pH<) 

xp(xi\u,s)p(x2\v,t)p*(y\x1,x2) 

and  D i  and  D2  are  given  by 

Di  =  fWS.Ml/.V))] 

D2  =  E[d2(T,h2(U,V))] 
then  {D\,D2)  €  V. 
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Abstract  —  We  address  the  problem  of  finite- 
state  code  construction  for  the  costly  channel. 
Adler  et  al.  developed  the  powerful  state¬ 
splitting  algorithm  for  use  in  the  construction 
of  finite-state  codes  for  hard-constrained 
channels.  We  extend  the  state-splitting 
algorithm  to  the  costly  channel.  We  present 
several  examples  of  costly  channels  related  to 
magnetic  recording,  the  telegraph  channel,  and 
shaping  gain  in  modulation.  We  design  a 
number  of  synchronous  and  asynchronous 
codes,  some  of  which  come  very  close  to 
achieving  capacity. 

I.  Introduction 

In  a  costly  channel,  sequences  of  symbols  are  assigned 
costs  (possibly  infinite).  A  constraint  in  the  form  of  an 
average  cost  is  imposed  on  the  sequences.  A  costly  channel 
is  a  natural  generalization  of  a  hard-constrained  channel  (or 
subshift),  where  sequences  are  assigned  either  cost  zero  or 
infinity.  Many  hard-constrained  channels  of  interest  have  a 
finite-state  structure,  and  can  be  represented  by  finite 
directed  graphs.  Similarly,  finite-state  costly  channels  can 
be  represented  by  finite  directed  graphs  with  an  additional 
cost  labeling. 

We  present  a  method  for  constructing  finite-state 
codes  for  the  costly  channel.  Our  finite-state  codes  come  in 
two  varieties.  The  first  is  a  synchronous  (fixed-length  to 
fixed-length)  code.  The  second  is  an  asynchronous 
(variable-length  to  fixed-length)  code.  The  latter  has  a 
higher  rate,  but  it  has  the  drawbacks  common  to  all 
asynchronous  schemes,  in  particular  the  potential  for  error 
propagation.  At  the  heart  of  our  method  is  a  modified 
version  of  the  state  splitting  algorithm  of  Adler, 
Coppersmith,  and  Hassner.  The  capacity-cost  function 
C(p)  is  the  maximum  code  rate  for  a  given  target  cost  p. 
Our  asynchronous  codes  come  very  close  to  achieving 
C(p),  while  the  synchronous  codes  achieve  a  lower  rate,  but 
still  come  pretty  close  to  C(p).  Given  a  graph  G 
representing  the  costly  channel,  C(p)  is  achieved  by  a 
Markov  chain  defined  on  the  edges  of  G.  We  associate  with 
G  a  modified  adjacency  matrix  B  that  reflects  the  target 
cost  p.  Then 

C(p)  =  log  7,  +  ji  p  log  e 

where  X  is  the  largest  eigenvalue  of  B,  and  |i  =  dC/dp. 


David  L.  Neuhoff 

EECS  Department 
University  of  Michigan 

Ann  Arbor,  MI  48109,  USA 

II.  Code  Construction 

Our  construction  is  summarized  as  follows.  For  a  given 
p,  we  choose  n  >  1  and  m  >  1  such  that  min  does  not 
exceed  a  function  related  to  C(p).  Then  we  construct  an 
asynchronous  encoder  graph  with  power  n  and  with 
smallest  state  outdegree  equal  to  2m  (and  every  outdegree  a 
power  of  2),  and  consequently,  whose  rate  exceeds  min . 
We  also'  construct  a  synchronous  encoder  with  every 
outdegree  equal  to  2m,  and  rate  min. 

We  assume  that  the  information  source  is  binary  IID 
with  a  uniform  distribution.  The  source  induces  a  stationary 
Markov  chain  on  the  encoder  graph,  where  the  edges 
leaving  a  state  have  a  uniform  conditional  probability.  It 
also  yields  a  coding  rate,  and  a  coding  cost. 

The  idea  of  the  code  construction  is  to  obtain  an 
encoder  graph  such  that  the  source-induced  Markov  chain 
coincides  with  the  optimal  Markov  chain  that  achieves 
capacity.  Then  the  code  will  actually  achieve  capacity.  It 
turns  out  that  in  most  cases,  we  can  only  approximate  the 
optimal  Markov  chain,  but  the  resulting  codes  are  still  very 
good. 

Our  construction  consists  of  three  stages.  It  uses  state 
splitting  and  edge  pruning.  First,  we  use  state  splitting  to 
obtain  a  uniform  cost  graph,  that  is  one  where  all  edges 
leaving  a  state  have  the  same  cost.  Secondly,  we  use  state 
splitting  in  a  way  similar  to  Adler  et  al.  Let  v  denote  the 
eigenvector  corresponding  to  X.  We  perform  a  sequence  of 
state  splittings  to  obtain  a  graph  whose  v  (or  a  related 
approximate  eigenvector  x)  is  equal  to  the  all  ones  vector. 
Thirdly,  we  use  edge  pruning  to  obtain  a  graph  with  the 
appropriate  state  outdegrees.  Edge  pruning  must  be  done 
carefully,  since  it  affects  both  the  coding  rate  and  the 
coding  cost. 

III.  Examples 

We  introduce  costly  channel  models  for  magnetic 
recording,  namely  variations  on  the  (1,3)  and  (2,7)  hard- 
constrained  channels.  We  also  consider  Shannon  s 
telegraph  model  as  a  costly  channel,  and  relate  his 
definition  of  capacity  to  our  capacity-cost  function. 
Finally  we  show  an  application  of  our  technique  to  the 
problem  of  shaping  in  amplitude  modulation.  Our  codes  are 
consistently  good,  and  several  almost  achieve  capacity. 
Their  complexity  is  low,  judging  by  the  number  of  encoder 
states. 
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On  the  Capacity  of  M- ary  Run-Length-Limited  Codes1 
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Abstract  —  We  present  two  results  on  the  Shannon 
capacity  of  M- ary  (d,k)  codes.  First  we  show  that 
100-percent  efficient  fixed-rate  codes  are  impossible 
for  all  values  of  (M,  d,  &),  0  <  d  <  A;  <  oo,  M<oo, 
thereby  extending  a  result  of  Ashley  and  Siegel  to  M- 
ary  channels.  Second,  we  show  that  (unlike  the  binary 
case)  for  k  =  oo,  there  exist  an  infinite  number  of  100- 
percent  efficient  M-ary  (d,  k)  codes  and  we  construct 
one  such  code. 


The  capacity  C  is  therefore  an  upper  bound  to  all  achievable 
rates  r,  and  the  code  efficiency  E  =  r/C  is  the  ratio  of  the 
actual  coding  rate  to  the  largest  rate  achievable.  We  give  the 
following  two  propositions  and  one  new  capacity  achieving 
code.  Denote  the  binary  capacity  of  the  ( M ,  d,  k )  constraint 
as  C(M,d,k). 

Proposition  1:  C(M,  d,k)  is  irrational  for  all  (M,  d,  k), 
Q  <  d  <  k  <  oo,  M  <  oo. 


I.  Summary 

Traditional  magnetic  and  optical  recording  employ  saturation 
recording,  where  the  channel  input  is  constrained  to  be  a  bi¬ 
nary  sequence  satisfying  run-length  limiting  (RLL)  or  (d,  k) 
constraints.  A  binary  (d,  k)  sequence  is  one  where  the  num¬ 
ber  of  zeroes  between  consecutive  ones  is  at  least  d  and  at 
most  k. 

The  recording  media  in  [1]  supports  unsaturated,  M-ary 
(M  >  3)  signaling  while  requiring  that  run-length-limiting 
constraints  be  satisfied.  Assuming  an  M-ary  symbol  alphabet, 
A  —  {0, 1, . . . ,  M  —  1},  M  <  oo,  an  M-ary  run-length-limited 
or  (M,  d,  k)  sequence  [2]  is  one  where  at  least  d  and  at  most 
k  zeroes  occur  between  nonzero  symbols.  Binary  (d,  k)  codes 
are  M-ary  ( d ,  k)  codes  with  M  —  2. 

In  [3]  (and  the  applicable  corrections  in  [4])  it  was  shown 
that  for  binary  (d,k)  codes,  there  exist  no  100-percent  effi¬ 
cient  codes.  Specifically,  the  Shannon  capacity  of  all  binary 
RLL  (d,k)  constraints  is  irrational  for  all  values  of  (d,  k), 
0  <  d  <  k  <  oo,  and  hence,  there  exist  no  fixed  rate  codes 
that  achieve  capacity.  In  this  paper  we  present  two  proposi¬ 
tions  on  (M,  d,  k)  codes.  First,  for  any  integer  M,  100-percent 
efficient  fixed-rate  codes  are  impossible  for  all  values  of  (d,  k ), 
0  <  d  <  k  <  oo,  thereby  extending  [3]  to  the  M-ary  channel. 
Secondly,  unlike  [3],  for  k  =  oo  there  do  exist  (an  oo  num¬ 
ber  of)  100-percent  efficient  codes,  and  we  construct  one  such 
code  using  the  state  splitting  algorithm  [5]. 

The  RLL  (M,  d ,  k)  constraint  is  often  represented  by  a  finite 
state  transition  diagram  (FSTD)  G.  Associated  to  the  FSTD 
with  k  - hi  vertices  is  a  state-transition  matrix  T,  a  (k  -j- 1)  x 
(k  +  1)  matrix  defined  by  T  =  [tl3]  where  ttJ  is  the  number  of 
edges  in  G  from  state  i  to  state  j. 

Shannon  showed  that  if  the  FSTD  G  has  distinct  labels  on 
the  outgoing  edges  of  each  state,  the  capacity  of  a  system  con¬ 
strained  by  sequences  from  G  is  C  =  log2A  (bits  per  symbol) 
where  A  is  the  largest  real  eigenvalue  of  the  adjacency  matrix 
T  associated  with  G.  We  have  assumed  base-2  logarithms 
because  it  is  the  most  common  case,  but  all  results  are  ex¬ 
tendable  to  non-base-2  logarithms. 

We  consider  fixed  rate  r  —  m/n  encoders  that  map  m  user 
bits  to  n  channel  symbols  satisfying  the  (M,  d ,  k )  constraints. 
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Since  this  capacity  is  irrational,  there  exist  no  100  percent 
efficient  fixed  rate  r  =  m/n  codes,  namely  r  <  C(M,d,k). 

Proposition  2:  For  any  0  <  d  <  oo,  and  k  =  oo  the 
set  of  M’s  for  which  C(M,  d,  oo)  is  rational  is  {M  :  M  = 
2dm(2m  —  1)  4- 1, integer  m  >  1}. 

Since  this  capacity  is  rational  for  some  M,  there  exist  100 
percent  fixed  rate  r  =  m/n  =  C  codes  that  achieve  capac¬ 
ity  C(My  d,  k).  What  follows  is  a  construction  of  one  such 
code  satisfying  (5,2,  oo)  constraints  using  the  state  splitting 
algorithm.  For  details  on  the  state  splitting  algorithm  see  [5]. 

(5,2,  oo)  code :  The  adjacency  matrix  for  the  (5,2,  oo)  is 


T  = 


0  10' 
0  0  1 

4  0  1 


(i) 


with  capacity  C  —  1.  Choosing  m  =  n  —  1  one  can  design 
a  rate  r  —  m/n  =  1  =  C  code.  An  approximate  eigenvector 
v  satisfying  Tv  >  2v  is  v  =  (1,2,4)T.  After  two  rounds  of 
splitting  and  some  simple  merging  a  five-state  encoder  results 
[6].  A  sliding  block  decoder  with  a  sliding  window  six  symbols 
wide  (corresponding  to  memory  m  —  3  and  anticipation  a  =  2) 
is  sufficient. 


II.  Acknowledgment 

The  authors  gratefully  acknowledge  Optex  Communications 

Corporation  whose  support  initiated  this  work. 

References 

[1]  A.  Earman  “Optical  data  storage  with  electron  trapping  mate¬ 
rials  using  M-ary  data  channel  coding,” Proceedings  of  the  SPIE 
1663,  Optical  Data  Storage ,  pp.  92-103,  San  Jose,  CA,  1992. 

[2]  D.T.  Tang  and  L.R.  Bahl,  “Block  Codes  for  a  Class  of  Con¬ 
strained  Noiseless  Channels,”  Information  and  Control ,  vol.  17, 
1970,  pp.  436-461. 

[3]  J.  Ashley  and  P.  Siegel,  “A  Note  on  the  Shannon  Capacity  of 
Run-length  Limited  Codes,”  IEEE  Trans.  Inform.  Theory ,  vol. 
IT-33,  no.  4,  pp.  601-605,  July  1987. 

[4]  J .  Ashley,  M.  Hilden,  P.  Perry  and  P.  Siegel,  “Correction  to  “A 
Note  on  the  Shannon  Capacity  of  Runlength- Limited  Codes”,” 
IEEE  Trans.  Inform.  Theory ,  vol.  IT-39,  no.  3,  pp.  1110-1112, 
May  1993. 

[5]  R.  Adler,  D.  Coppersmith  and  M.  Hassner,  “Algorithms  for 
sliding  block  codes,”  IEEE  Trans.  Info.  Thy  vol.  29,  no.l,  pp. 
5-22,  Jan.  1983. 

[6]  S.  McLaughlin,  J.  Luo  and  Q.  Xie  “On  the  capacity  of  M-ary 
run-length- limited  codes,”  to  appear  IEEE  Trans,  on  Informa - 
tion  Theory. 


200 


Joint  Multilevel  RLL  and  Error  Correction  Coding 

Mohamed  Siala  and  Ghassan  Kawas  Kaleh 

Ecole  Nationale  Superieure  des  Telecommunications, 

46,  rue  Barrault,  75634  Paris  13,  France 


Runlength-limited  (RLL)  codes,  also  known  as  (d,  k)  RLL 
codes,  are  used  in  digital  magnetic  recording  and  have  poten¬ 
tial  use  in  Soliton  optical  communication.  Let  0m  denotes  a 
sequence  of  m  successive  zeros.  We  define  the  alphabet: 

ndk  =  {Vr,V+1r,.„,“oki”} 

A  (d,  fc)  phrase  is  an  element  in  Hdk-  A  (d,fc)  runlength- 
limited  sequence  is  defined  as  the  concatenation  of  such 
phrases. 

We  have  presented  in  ISIT’94  a  new  approach  for  construct¬ 
ing  simple  and  efficient  variable-length  (d,  k)  RLL  codes  which 
can  be  decoded  with  no  memory  and  no  anticipation.  An  n- 
dimensional  RLL  code  C JjP  is  defined  as 

Cdk  =  {p  =  (PuP2>---,Pn)  €(Kdk)n  : 

l(pi )  +  1{P2)  H - h  l(Pn)  <  nL } 

where  l(p)  is  the  length  of  an  element  p  in  Hdk  and  L  is  a 
normalized  threshold. 

We  show  here  that  if  the  size  of  'Hdk>  k  —  d  +  1,  is  equal 
to  5m  for  some  arbitrary  integers  b  and  m  all  greater  than  or 
equal  to  2,  the  encoding/decoding  algorithms  can  be  greatly 
simplified,  using  m  parallel  simple  codes,  called  component 
codes,  with  the  same  properties  as  the  original  code  CJjP . 

The  block  diagram  of  the  proposed  coding  system  is  de¬ 
picted  in  the  Figure.  A  binary  information  sequence  I  is 
demultiplexed  into  m  binary  subsequences  . . . ,  Jm— l* 

Each  subsequence  is  encoded  by  an  independent  fr-ary  shap¬ 
ing  set,  denoted  by  Si,i  —  0,  —  1.  The  rate  of  the 

code  Si  is  Ri  —  ki/m.  Its  output  fr-ary  7ii-tuples  xx ,  j  — 
. . . ,  —  1,  0, 1, . . .,  are  concatenated  into  a  fc-ary  infinite  sequence 
xx.  Let  Zb  denote  the  k-ary  alphabet  {0, 1, . . . ,  b  —  1},  and  x\ 
the  output  (in  Zb)  of  the  encoder  for  Si  at  time  t,  where  t 
is  an  integer.  The  b- ary  symbols  x?,  xj, . . . ,  x™  1  £  Zb>  are 
mapped  synchronously  onto  a  phrase  length 

m— 1 

lt  =  d-{-  1  +  ^2  >  C1) 

i= 0 

in 

Jdk  =  {d+l,d+2,...,*  +  l}. 

The  phrase  length  lt  is  then  mapped  into  the  phrase  pt,  in  'Hdk-, 
with  length  equal  to  /*.  The  resulting  phrase  pt  is  subsequently 
recorded  on  the  magnetic  media  or  transmitted  using  soliton 
pulses. 

To  simplify  the  notations,  we  assume  that  the  readback  sig¬ 
nal  is  available  after  a  null  time  delay.  The  phrase  at  time  t 
in  the  readback  (for  recording  systems)  or  received  (for  fiber 
optic  transmission  using  solitons)  signal,  denoted  by  pt,  is  as¬ 
sumed  to  be  in  Hdk*  Denote  by  lt  the  length  of  p*.  The  inverse 
mapper  outputs  the  estimates  x\  £  Zb ,  i  =  0, 1, . . . ,  m  —  1,  ver¬ 
ifying 

m— 1 

^2  =  U  —  (d  +  1), 

i- 0 


to  the  decoding  circuit.  The  estimates  Xi  of  the  code¬ 
words  xx  are  obtained  from  the  final  estimated  sequences 
{. . . ,  ,  xj,  xj+1 i  =  0, 1, ....  m  -  1.  These  sequences 

are  split  into  the  ni-tuples  x* ,  j  =  . . . ,  —  1,  0, 1, . . ..  These  Tit- 
tuples,  which  are  estimates  of  the  Tij-tuples  x)  at  the  encoder 
side,  are  concatenated  into  the  estimate  xl  of  the  sequence  xl 
at  the  output  of  the  shaping  sets  Si . 

The  decoders  for  Si  use  these  sequence  estimates  and  de¬ 
liver  the  estimates  /»,  i  =  0, 1, . . . ,  m  —  1,  of  the  component 
information  sequences  J*.  The  estimate  I  of  the  desired  final 
information  sequence  I  is  constructed  by  combining  the  digits 
of  the  sequences  /»,  i  =  0, 1, . . . ,  m  —  1. 

Moreover;  we  show  that  this  approach  to  RLL  coding  can 
naturally  incorporate  multilevel  error  correction  coding.  Re¬ 
call  that  the  most  common  type  of  errors  in  digital  magnetic 
recording  are:  Shift  errors,  Drop-in  errors,  Drop-out  errors, 
Insertion  errors  and  Deletion  errors.  Although  deletions  and 
insertions  are  not  as  common  as  the  other  types  of  errors,  they 
can  involve  a  catastrophic  propagation  of  errors  when  using 
a  conventional  rate  pjq  finite  state  encoder  ,  since  they  can 
change  q ,  the  length  of  the  (d,  k)  constrained  sequences  gen¬ 
erated  by  this  code.  Deletions  and  insertions  of  zeros  (“0”) 
as  well  as  shift  errors  do  not  change  the  number  of  phrases  in 
the  (d,  fc)  RLL  codewords.  From  this,  we  conclude  that  our 
approach  to  RLL  coding  allows  us  not  only  to  correct  shift 
errors  but  also  insertions  and  deletions  of  zeros.  The  main 
idea  for  achieving  this  is  to  use  Lee-metric  codes.  The  compo¬ 
nent  codes  of  the  multilevel  block  code  are  chosen  such  that 
this  code  has  a  large  minimum  Lee  distance  and  such  that 
the  (d,  k )  RLL  code  combined  with  the  error  correction  mul¬ 
tilevel  code  provides  the  highest  possible  rate.  We  emphasize 
that  multilevel  (d,  k)  RLL  coding  combined  with  multilevel 
block  coding  for  error  correction  is  well  suited  to  multistage 
decoding  . 

We  also  show  that  for  an  appropriate  choice  of  the  nor¬ 
malized  thresholds  of  these  parallel  codes,  the  rate  of  the  cor¬ 
responding  (d,  k)  RLL  code  converges  to  the  capacity  of  the 
(d,  k)  constraint,  as  the  dimension  of  each  of  the  parallel  codes 
goes  to  infinity. 
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Abstract  -  In  magnetic  and  optical  storage  systems, 
RLL  (run  length  limited)  modulation  code  is  widely 
used.  An  RLL  code  having  a  limited  amount  of  error 
propagation  and  high  density  ratio  is  attractive  for  high 
density  storage  system.  This  paper  presents  a  new 
fixed- length  RLL  modulation  code  with  d-4  and  k=20, 
and  coderate  m/n-4/ 11.  A  code  design  is  based  on  the 
Look-Ahead  method.  This  code  has  finite  error 
propagation  limited  to  at  most  two  codewords. 

I.  Introduction 

To  increase  storage-capacity,  runlength  limited 
modulation  code  has  been  widely  used  for  data  storage 
systems  such  as  magnetic  disk/tape  and  optical  disk, 
because  RLL  code  has  density  ratio  (DR)  larger  than  1. 
Most  RLL  codes  for  practical  storage  systems  have  low 
DR.  EFM  (Eight  to  Fourteen  Modulation)  code  and  (1,  7), 
(2,  7)  RLL  codes  which  are  widely  used  for  contemporary 
storage  systems  have  DR  less  than  1.5.  Thus  these  codes 
are  less  attractive  for  high  density  storage  system.  As 
storage  system  evolves  to  high  density  storage  system,  it 
is  necessary  to  design  an  RLL  code  with  high  DR  and  low 
hardware  complexity  and  encoding/decoding  delay[l]. 

(d,  k)  RLL  code  is  a  binary  sequence,  where  the 
number  of  "zeros"  between  consecutive  "ones"  must  be  at 
least  d  and  at  most  k.  Density  ratio  is  determined  by  d 
and  coderate  R-m/n  where  m  is  information  length  and  n 
is  codelength.  DR  increases  as  d  increases.  However  DR  is 
limited  by  C  which  is  the  capacity  of  the  (d,  k)  constrained 
noiseless  channel[l],  [2].  Many  class  of  design  techniques 
have  been  proposed.  Alder,  Coppersmith,  and  Hassner 
showed  that  Sliding -block  code  algorithm  can  produce  an 
m/n  (d,  k)  RLL  code  if  R( -m/n)  <  C  for  positive  integers 
m  and  n[3].  Jacoby  and  et  al.  developed  and  have  evolved 
Look-Ahead  coding  method  (LA)  which  is  attractive  for 
practical  consideration,  if  m  and  n  are  small[4].  RLL  code 
has  error  propagation  and  it  is  not  always  simple  to 
achieve  the  minimum  amount  of  error  propagation.  Blaum 
showed  that  the  error  propagation  is  a  major  issue  in 
storage  system  adopting  error  correcting  codes®. 

In  this  paper,  we  suggest  a  new  fixed -codelength  (4, 
20)  RLL  code  with  coderate  R=4/ll  which  is  based  on 
Look-Ahead  coding  method  with  full-codelength  look-ahead. 
We  will  show  that  (4,  20)  RLL  code  has  error  propagation 
limited  to  at  most  two  codewords  and  DR  of  20/11. 


IL  (4,  20)  Rll  Modulation  Code 

For  high  density  storage  system,  it  is  necessary  for 
RLL  code  to  have  high  DR,  small  error  propagation,  and 
low  (k+l)/(d+l)[  1].  In  NRZI  recording  scheme,  DR  is 
(m/n)(d+l).  It  was  shown  that  for  positive  integers  m  and 
n,  a  (d,  k)  RLL  code  with  R-m/n  exists  if  R<C,  where  C 
is  the  capacity  of  (d,  k)  constrained  noiseless  channel. 
Channel  capacity  of  (4,  00 )  RLL  code  is  0.4056.  Therefore 
an  RLL  code  with  d-4  and  R-4/11  is  feasible®,  [2]. 

(4,  20)  RLL  code  is  a  fixed  codelength  RLL  code  with 
R-4/11  and  it  translates  information  block  of  4  bits  into  a 

lThis  work  was  supported  by  a  grant  from  the  SAIT  (Samsung  Advanced 
Institute  of  Technology),  Korea. 


codeword  block  of  11  bits.  (4,  20)  RLL  code  is  composed  of 
16  codewords.  All  codewords  satisfy  d-4  constraint.  There 
is  always  the  possibility  that  consecutive  codewords  violate 
d-4  constraint.  When  d-4  constraint  is  violated,  we  require 
substitutions  in  order  to  eliminate  successive  "ones".  There 
are  three  cases  of  violation.  Let  a  precursive  codeword 

P=(Pi,p2 . /•»)  be  the  first  codeword  and  a  successive 

codeword  S=  (slt  s2.”\  sn)  be  the  second  codeword  among 
consecutively  two  codewords.  In  TABLE  I,  three  violation 
cases  are  given  where  V  denotes  don't  care  bit. 
Precursive  and  successive  codewords  are  substituted  by 
Rule  I,  II,  in  when  violation  cases  occur. 


TABLE  I.  CASES  OF  VIOLATION  AND  SUBSTITUTION  RULES 


Case 

Check-Bits  for  Violation 

Rule 

Pi  Pi  Ps  P io  Pn  $1  s2  s2 

(1) 

x  x  1  0  0  :  0  1  x 

Rule  I 

(2) 

x  x  1  0  0  :  1  x  x 

Rule  II 

(3) 

0  1  0  0  0  :  1  x  x 

Rule  m-1  (  P3='0') 
Rule  m-2  (Pz= T) 

Density  ratio  of  (4,  20)  RLL  code  is  20/11  which  is 
38%  greater  than  (1,  7)  RLL  code  and  29%  greater  than 
EFM  code.  Also  (k+l)/(d+l)  is  21/5  which  is  similar  to 
that  of  (1,  7)  RLL  code.  (4,  20)  RLL  code  has  finite  error 
propagation  limited  to  at  most  two  codewords  because  all 
substituted  codewords  have  always  "five  zeros"  at  the 
position  of  7,  8,  9,  10,  ll'th  bits.  Thus  encoding  is 
completed  by  considering  only  two  consecutive  codewords 
and  it  is  impossible  for  errors  on  a  codeword  to  propagate 
into  codewords  more  than  two. 

EL  Conclusions 

We  designed  a  fixed-codelength  (4,  20)  RLL 

modulation  code  for  high  density  storage  systems.  Density 
ratio  of  (4,  20)  RLL  code  is  greater  than  (1,  7)  RLL  code 
and  EFM  code.  Also  it  has  finite  error  propagation  limited 
to  at  most  two  codewords.  (4,  20)  RLL  code  has  low 
complexity  of  hardware  and  it  is  feasible  to  be  implemented 
by  look-up  table  of  small  size. 
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Abstract  —  We  present  the  most  remarkable  re¬ 
sults  obtained  from  a  numerical  analysis  of  relevant 
statistical  properties  of  binary  maxentropic  DC-free 
runlength-limited  (DCRLL)  sequences.  In  particular, 
we  consider  the  sum  variance  and  its  relation  to  the 
low-frequency  characteristic  or  the  redundancy.  Fur¬ 
ther,  we  present  an  approximation  of  the  runlength 
distribution  of  binary  maxentropic  pure  charge  con¬ 
strained  sequences. 

I.  Introduction 

Binary  DC-free  runlength-limited  (DCRLL)  sequences  are 
widely  applied  in  digital  storage  systems,  for  example  in  the 
CD  player  [2].  The  fact  that  there  is  still  no  profound  knowl¬ 
edge  of  the  relevant  statistical  properties  of  these  sequences 
motivated  us  to  investigate  these  properties  for  a  wide  range 
of  constraints.  We  will  briefly  present  the  most  remarkable 
results  obtained  in  the  maxentropic  case.  We  characterize 
DCRLL  sequences  by  three  integer  parameters  (d,  k,  TV),  de¬ 
noting  that  the  runlengths  occurring  in  these  sequences  are 
constrained  between  d  +  1  and  k  +  1,  and  the  charge  or  run¬ 
ning  digital  sum  (RDS)  assumes  TV  distinct  values.  Note  that 
we  consider  sequences  of  symbols  drawn  from  [  —  1, 1},  and 
that  the  constraints  satisfy  0<d<k<N  —  2.  In  order  to 
represent  the  (d,  k,  TV)  constraints  we  use  the  concept  of  run- 
length  graphs  described  by  Kerpez  et  al.  [l],  and  we  interpret 
a  maxentropic  DCRLL  sequence  as  generated  by  a  stationary 
Markov  chain  based  on  such  a  graph.  This  Markov  chain  de¬ 
scription  allows  the  evaluation  of  the  power  spectral  density 
function  //(a;),  the  sum  variance  a 2z{d,k,N)  (i.e.  the  vari¬ 
ance  of  the  RDS),  the  runlength  distribution,  and  the  average 
runlength  of  maxentropic  DCRLL  sequences  [1]. 

II.  The  Main  Numerical  Results 

The  analysis  of  the  sum  variance  <r2{d,  k ,  TV)  in  the  practi¬ 
cally  interesting  range  of  constraints,  (0<d<2,d<&< 
TV  —  2,9  <  TV  <  30),  reveals  that  it  is  in  good  approxi¬ 
mation  determined  by  TV,  and  roughly  independent  of  the  d 
and  k  constraints.  Hence,  we  can  approximate  <r l(d,k,N) 
by  the  known  expression  for  the  sum  variance  of  maxen¬ 
tropic  pure  charge  constrained  sequences,  i.e.,  cr2z(d,k,N)  « 
(1/12  —  7:~2  /2)(N  A- 1)2  [2].  For  the  analyzed  range  of  (d,  k,  TV) 
constraints,  this  approximation  is  within  5%  accuracy  as  long 
as  k  —  d  >  [TV/2J.  For  k  constraints  only  slightly  larger  than 
d,  the  true  sum  variances  cr2z{d,k,  TV)  are  somewhat  less  than 
the  above  approximation. 

In  the  case  of  maxentropic  pure  charge  constrained  se¬ 
quences,  there  is  a  simple  relation  between  sum  variance  and 
low-frequency  characteristic  [2].  We  are  interested  whether 
a  corresponding  relation  exists  for  maxentropic  DCRLL  se¬ 
quences.  We  express  the  low-frequency  characteristic  by  a 
well-defined  cut-off  frequency.  In  order  to  provide  a  clear 
physical  interpretation,  we  define  the  cut-off  frequency  wc  of  a 
maxentropic  DCRLL  sequence  by  H(u>c)  =  Ho(d,k)/2,  where 


//0(d,  k)  denotes  the  DC-content  of  a  maxentropic  runlength- 
limited  sequence  with  parameters  (d,k)  [2].  Indeed,  for  TV  >  1 
we  could  find  the  relation  loc  «  Ho(d,  k)\2a2z{d)  k,  TV)]-1  be¬ 
tween  sum  variance  and  cut-off  frequency.  For  d  constraints 
0  <  d  <  2,  this  approximation  is  within  10%  accuracy  as 
TV  >  17.  We  conclude  that  for  TV  sufficiently  large  the  sum 
variance  (r2z(d,k ,  TV)  is  a  useful  criterion  of  the  low-frequency 
characteristic  of  a  maxentropic  DCRLL  sequence,  a  fact  which 
again  justifies  the  definition  of  the  cut-off  frequency  u>c. 

As  shown  in  [2],  maxentropic  pure  charge  constrained  se¬ 
quences  have  the  fundamental  property  that  the  product  of 
sum  variance  and  redundancy  is  approximately  constant.  By 
introducing  a  refined  redundancy  definition,  it  turns  out  that 
for  TV  1  a  corresponding  relation  also  holds  for  maxentropic 
DCRLL  sequences.  Let  the  extra  redundancy  be  defined  as 
C(d,  k,  oo )  — C(d,  k ,  TV),  where  C(d,  k ,  TV)  denotes  the  capacity 
of  the  (d,  k,  TV)  constraint.  In  other  words,  the  extra  redun¬ 
dancy  describes  the  increment  in  redundancy  from  the  (d,k) 
runlength  constraint  to  the  (d,  fc,  TV)  constraint.  For  maxen¬ 
tropic  DCRLL  sequences,  we  found  that  the  product  of  sum 
variance  and  extra  redundancy  for  TV  1  assumes  a  constant 
value  which  is  determined  by  the  d  and  k  constraints.  In  the 
absence  of  a  specific  k  constraint  (i.e.  k  =  TV  —  2)  and  for  d 
constraints  0  <  d  <  2,  for  example,  this  sum  variance-extra 
redundancy  product  appears  to  be  constant  for  about  TV  >  20. 

III.  A  Neat  Analytical  Result 

Kerpez  et  al.  [1]  present  a  closed-form  expression  for  the 
Markov  chain  description  of  a  maxentropic  pure  charge  con¬ 
strained  sequence,  where  the  constraint  is  represented  by  a 
runlength  graph.  Using  this  result,  we  are  able  to  derive  a 
closed-form  expression  for  the  runlength  distribution  of  such 
a  sequence.  For  TV  >  1,  we  can  substitute  the  sums  occur¬ 
ring  in  this  expression  by  integrals  which  can  be  solved  using 
some  trigonometric  manipulation.  In  this  way,  for  TV  ^  1 
we  approximately  obtain  the  probability  of  occurrence  of  a 
run  of  length  l  in  such  a  sequence  by  Pr(/)  «  &j\r(J)A-i, 
where  l  6  {l,2,...,TV-l},  log2  A  denotes  the  capacity  of 
the  charge  constraint  (i.e.  A  =  2cos[7r(TV  -f  l)-1]),  and 
kN(l)  =  (l  -  TV  +  1)(TV  +  l)-1  cos[t r(TV  -  1  +  l)(N  +  l)"1]  + 
7r— 1  sin[7r(/V  —  1  —  l)(N  +  l)-1]. 
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Abstract  -  In  this  paper  two  new  methods  for  the 
detection  of  multiple  insertion/deletions  are  presented. 
The  first  method  recognises  insertions/deletions  in  the 
previous  symbol  stream  by  extracting  additional 
information  from  commonly  used  markers.  A  new 
coding  method  is  also  presented  that  relies  on  the 
number  of  transitions  in  a  codeword  to  detect 
insertions/deletions  in  the  previous  codeword.  This 
coding  scheme  also  has  certain  spectral  properties. 

I.  Introduction 

The  insertion/deletion  of  symbols  in  a  codeword  result  in  a 
change  in  the  length  of  the  word.  As  a  result  the  frame 
alignment  is  lost.  One  should  make  a  distinction  between  the 
above  case  and  the  case  of  additive  errors  where  only  certain 
symbols  in  codewords  are  changed. 

Recently  [1],  a  coding  algorithm  was  developed  that  generates 
codes  with  the  ability  to  correct  several  insertions/deletions, 
assuming  that  the  codeword  boundaries  were  known. 

Two  approaches  are  presented  in  this  paper.  In  section  II  a 
marker  method  is  described  that  enables  the  receiver  to  detect 
insertions/deletions  in  the  previous  frame.  This  is  the  first  step  to 
correct  insertions/deletions.  In  section  III,  a  simple  coding 
method  is  presented  to  detect  insertions/deletions  in  every 
codeword,  utilising  the  number  of  transitions  in  every  codeword. 

II.  Markers 

Markers  (a  known  sequence  of  symbols)  are  used  to  delineate  a 
stream  of  symbols  into  frames.  In  [2]  a  comprehensive  overview 
on  markers  is  presented.  Additional  information  can  be  extracted 
from  commonly  used  markers  for  the  detection  of 
insertions/deletions.  If  an  insertion/deletion  occurred  in  the 
preceding  frame,  the  marker  is  shifted  left  or  right.  The  decoder 
recognises  a  shifted  version  of  the  marker,  as  well  as  unkown 
adjacent  symbols  from  the  data  stream,  in  the  expected  marker 
position. 

Markers  are  chosen  in  such  a  way  that  the  resulting  sequence  as 
described  above  are  uniquely  recognisable.  All  possible  resulting 
sequences  are  stored  in  a  lookup  table  which  enables  the  decoder 
to  detect  a)  that  insertions/deletions  occurred,  and  b)  the  number 
of  shifts. 

The  functionality  of  the  proposed  scheme  can  be  extended  to 
include  the  detection  of  additive  errors  in  the  marker. 


III.  Insertion/Deletion  Detecting  Code 

A  new  insertion/deletion  detecting  code  is  introduced  that  relies 
on  a  constant  number  of  transitions  from  0  to  1  or  1  to  0  in  the 
symbols  of  the  codeword.  Insertions/deletions  in  the  preceding 
codeword  are  detected. 


Each  codeword  consists  of  three  sections:  a  head,  middle  and  tail 
section.  The  middle  section  consists  of  a  constant  number  of 
transitions  while  the  head  and  tail  sections  act  as  buffers. 

Insertions /deletions  in  the  preceding  codeword  introduce  shifts. 
The  codewords  are  chosen  in  such  a  way  that  the  transitions  of 
the  middle  section  increase  or  decrease  with  left  and  right  shifts. 
The  decoder  counts  the  number  of  transitions  in  the  middle 
section  of  the  expected  codeword.  In  this  way  the  decoder 
recognises  the  occurrence  of  insertions/deletions  in  the  previous 
symbol  stream  and  can  correct  the  frame  alignment. 

These  codes  are  very  flexible  and  good  rates  can  be  obtained. 
The  implementation  of  these  codes  is  easy  and  only  a  few  simple 
logic  gates  are  necessary  to  enable  the  detection  of 
insertions/deletions.  Lookup  tables  are  not  required  for  the 
detection  of  insertions/deletions  as  only  transitions  are  monitored. 
These  codes  can  be  used  to  maintain  frame  synchronisation.  If 
the  synchronisation  is  lost,  only  a  few  codewords  have  to  be 
examined  to  resynchronise  as  opposed  to  a  number  of  frames  in 
the  case  of  only  a  marker  being  used  once  in  a  frame  of  a  few 
hundred  symbols. 

As  a  result  of  the  use  of  transitions,  certain  spectral  density 
properties  are  obtained.  The  lower  the  number  of  transitions,  the 
lower  the  peak  energy  content  will  be  in  the  power  spectral 
density  of  the  code  and  vice  versa.  As  a  result,  these  codes  can 
be  useful  for  both  insertion/deletion  detection  and  spectral 
shaping. 

IV.  Conclusion 

In  this  paper  two  methods  were  presented  to  detect  the 
occurrence  of  insertions/deletions.  Both  methods  are  simple  to 
decode  and  are  of  low  complexity. 
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Abstract  —  A  construction  is  suggested  of  a  code 
which  corrects  single  bit-shift  errors  in  ( d ,  k)  modula¬ 
tion  codes.  The  codes  are  nearly  optimal  in  redun¬ 
dancy.  The  encoding  and  decoding  procedures  are 
linear  in  the  codeword  length. 

Magnetic  and  magneto-optical  data  recording  uses  a  tran¬ 
sition  from  one  direction  of  magnetization  to  another  to  rep¬ 
resent  1,  and  an  absence  of  transition  to  represent  0.  Due  to 
physical  and  technological  reasons  the  number  of  zeroes  be¬ 
tween  two  successive  transitions  is  limited  by  a  minimum  d 
and  a  maximum  k.  Codes  that  satisfy  these  constraints  are 
known  as  ( dyk )  run-length-limited  modulation  codes  [1,  2]. 

An  important  type  of  errors  in  magnetic  recording  is  a  shift 
of  the  border  between  two  magnetic  domains,  i.e.  a  displace¬ 
ment  of  the  position  of  1  (which  cannot  be  corrected  by  the 
usual  write  precompensation  technique).  These  are  so  called 
bit-shift  errors  [3].  Usual  error-correcting  codes  such  as  for 
the  binary  symmetric  channel  are  not  well-suited  for  this  type 
of  errors.  This  paper  sugests  constructions  of  codes  correcting 
single  diplacement  errors. 

Let  n  be  the  length  of  codewords  in  a  (d,  k)  code.  Then 
the  maximum  number  of  ones  (nonzero  components)  in  a  code¬ 
word  is  m  =  [n/dj. 

Consider  first  the  case  when  a  non-zero  component  can 
only  be  shifted  by  one  position  to  the  left  or  to  the  right. 
Denote  by  Xi  the  position  of  the  i-th  nonzero  component 
(1  <  x*  <  n5  1  <  m).  Now  on  top  of  modulation  (d,  k)~ 
constraints,  we  will  require  that  a  codeword  should  satisfy  the 
following  condition: 

Si  =  ^^iXi  =  0  (mod  2m  -f  1)  (1) 

* 

where  the  sum  is  taken  over  all  nonzero  components  of  a  code¬ 
word.  Generation  of  codewords  which  satisfy  condition  (1) 
can  be  conveniently  combined  with  satisfying  modulation  con¬ 
straints  by  modification  of  the  inverse  enumeration  algorithm 
suggested  by  Fitingof  [4]. 

The  sum  Si  is  the  syndrom.  If  5»  <  m ,  then  Si  —  i,  where  i 
is  the  number  of  the  nonzero  component  shifted  to  the  right.  If 
Si  >  m  +  1  then  2m  -f*  1  —  S;  =  i,  where  t  is  the  number  of  the 
nonzero  component  shifted  to  the  left.  Thus,  error  correction 
is  quite  simple. 

The  code  is  nearly  optimal.  Indeed,  since  condition  (1)  and 
modulation  (d,  k)  -constraints  are  independent,  one  can  expect 
that  the  number  of  codewords  which  satisfy  (1)  is  smaller  than 
the  size  of  the  (d,  k)  modulation  code  by  a  factor  of  2m  -j-  1. 
But  2m  -f  1  is  the  maximum  number  of  possible  errors  to  be 
corrected  (including  the  error- free  case).  The  deviation  from 
optimality  is  due  to  the  fact  that  the  actual  number  of  possible 
errors  in  a  given  codeword  can  be  smellier  than  the  maximum, 
because  of  a  smaller  number  of  nonzero  components. 

Consider  now  a  more  general  case,  when  one  of  the  nonzero 
components  can  be  displaced  up  to  r  positions  to  the  right  or 


to  the  left.  Thus,  the  displacement  g  is: 


—  r  <  g  <  r  (2) 

Construct  a  sequence  of  natural  numbers  (pi),  1  <  »  <  m  in 
the  following  way: 

1.  Pi  =  1,  P2  =  r  4-  1 

2.  pi  is  relatively  prime  with  pi,P2,  •  •  .  ,Pi- 1 

The  codewords  should  satisfy  the  following  condition: 


Si  =  ^^piXi  =  0  (mod  2rpm  +  1)  (3) 

t 

The  encoding  procedure  is  similar  to  that  for  the  case  r  =  1. 
The  error-correcting  properties  of  the  code  are  based  on  the 
following  theorem. 

Theorem  1  For  any  two  distinct  single  errors ,  i.e.  for  any 
two  displasements:  g  of  the  i-th  nonzero  component  and  h  of 
the  j-th  nonzero  component ,  the  syndromes  are  different: 

Sr(g,i)*Sr(hJ)  (4) 

(— r  <  g  <  r,  —r<h<r,  1  <  i  <  m,  1  <  j  <  m). 


The  correction  is  still  simple: 


1. 


Calculate  S'  = 


{ 


Sr 

Sr  2  TPm  1 


if  Sr  <  rpm 

if  Sr  >  rpm 


2.  Find  the  largest  g  <  r  such  that  ^  is  an  integer  and 


- t>r ■ 


Then  |  -f-  |=  pi,  where  i  is  the  number  of  the  displaced  com¬ 
ponent,  and  g •  sgn  S£  is  the  displacement. 

Since  the  enumeration  and  inverse  enumeration  algorithms 
for  (d,h)  decoding  and  encoding  suggested  in  [4]  have  linear 
(in  the  length  of  codewords)  complexity,  it  follows  that  the 
same  is  true  for  our  error-correcting  code. 
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Summary 

A  common  use  of  Gray  codes  is  in  reducing  quantisation  errors 
in  various  types  of  analogue- to- digital  conversion  systems  [1, 
2].  As  a  typical  example,  a  length  n  Gray  code  can  be  used  to 
record  the  absolute  angular  positions  of  a  rotating  wheel  by 
encoding  the  codewords  on  n  concentrically  arranged  tracks. 
n  reading  heads,  mounted  radially  across  the  tracks,  suffice  to 
recover  the  codewords  and  it  is  well  known  that  quantisation 
errors  are  minimised  by  using  a  Gray  encoding. 

When  high  resolution  is  required,  the  need  for  a  large 
number  of  concentric  tracks  results  in  encoders  with  large 
physical  dimensions.  This  poses  a  problem  in  the  design 
of  small-scale  or  high-speed  devices.  We  propose  single- 
track  Gray  codes  as  a  way  of  overcoming  this  problem. 
Let  Wo,  Wi, . . . ,  be  the  codewords  of  a  Gray  code  C 

and  write  Wi  =  ,  w\ , . . .  ,  w™~1]T .  We  call  the  sequence 

Wq  ,  w-[ , . . . ,  component  sequence  j  of  C. 

Definition  1  If  for  each  1  <  j  <  n,  component  sequence  j 
of  C  is  a  cyclic  shift  by  some  kj  of  component  sequence  0,  i.e. 

i  (where  subscripts 

are  reduced  modulo  p),  then  we  say  that  C  is  a  single-track 
Gray  code. 

In  a  single-track  Gray  code,  codeword  Wi  is  actually  equal 
to  [w^ ,  .  *  - )  and  so,  in  the  application 

above,  the  bits  of  any  codeword  can  be  obtained  solely  from 
a  single  track  corresponding  to  component  sequence  0.  The 
n  reading  heads  are  then  spaced  around  that  single  track  at 
fixed  relative  positions  0,  k\ ,  &2 , . .  . ,  kn-i .  So,  if  a  suitable 
single-track  Gray  code  is  available,  the  respective  encoder  can 
be  made  significantly  smaller  in  size. 

Necessary  conditions  on  the  parameters  n  and  p  of  a  single- 
track  Gray  code  are  easily  established: 

Lemma  2  Suppose  there  exists  a  length  n  single-track  Gray 
code  with  p  codewords.  Then  p  is  an  even  multiple  of  n  and 
2 n<p<  2n. 

We  are  interested  in  two  problems.  Firstly,  for  a  given  n, 
obtaining  a  single-track  Gray  code  with  as  many  codewords 
as  possible,  and  secondly,  for  a  given  number  of  codewords  (i.e. 
resolution),  obtaining  a  code  with  the  smallest  possible  length 
n  (i.e.  number  of  reading  heads).  Codes  are  easily  obtained  for 
n  =  1,2,3.  However,  for  larger  n,  the  construction  of  codes 
poses  an  interesting  combinatorial  problem.  Though  not  ruled 
out  by  the  necessary  conditions,  there  is  in  fact  no  length  4 
code  containing  all  16  words.  Thus  the  conditions  of  Lemma 
2  are  not  sufficient.  We  have  obtained  good  codes  by  hand 


for  small  n.  The  number  of  words  in  these  codes  and  the 
corresponding  bound  from  Lemma  2  are  shown  in  the  table 
below. 


n 

Number  of 

codewords 

Upper  bound 
from  Lemma  2 

4 

8 

16 

5 

30 

30 

6 

60 

60 

7 

126 

126 

8 

240 

256 

9 

360 

504 

As  an  example,  our  length  5  single-track  Gray  code  with 
30  codewords  is: 

001111000110000000011111111100 

000110000000011111111100001111 

000000011111111100001111000110 

011111111100001111000110000000 

111100001111000110000000011111 

The  code  for  n  =  9  is  particularly  useful,  as  it  gives  a  one- 
degree  resolution  using  the  least  possible  number  of  reading 
heads. 

Our  other  contribution  is  a  general  construction  yielding 
codes  for  a  large  variety  of  parameters  and  leading  to  the 
following: 

Theorem  3  Suppose  n  >  4.  Then  there  exists  a  length  n 
single-track  Gray  code  with  nt  codewords  for  every  even  t  sat¬ 
isfying  2  <  t  <  2n  \^2^n  1  1 . 

These  codes  are  not  in  general  optimal  with  respect  to  the 
conditions  of  Lemma  2.  We  propose  as  open  problems  finding 
better  or  even  optimal  single-track  Gray  codes  for  larger  n, 
and  obtaining  a  stronger  upper  bound  on  p  than  that  given 
by  Lemma  2. 
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Abstract  —  The  problem  of  optimizing  the  structure 
of  the  encoder/decoder  pair  in  a  discrete  communica¬ 
tion  system  (with  an  additive  distortion  measure)  is 
expressed  in  terms  of  a  Bilinear  Programming  Prob¬ 
lem  (BLP  Problem).  An  efficient  method,  based  on 
the  simplex  search  in  conjunction  with  the  General¬ 
ized  Upper  Bounding  Technique  is  presented  for  the 
solution.  The  special  features  of  the  problem  are  ex¬ 
ploited  to  reduce  the  computational  complexity  of  the 
proposed  algorithm. 

I.  Introduction 

Consider  a  discrete  communication  system  composed  of  a 
source  S,  a  channel  C,  an  encoder  £  and  a  decoder  rj.  The 
source  S  is  composed  of  Ns  symbols  s,-,  i  =  0, . . . ,  Ns  —  1.  The 
symbol  s;  E  S  occurs  with  probability  Ps(i).  A  measure  of 
distortion  is  defined  between  each  pair  of  the  source  symbols. 
The  distortion  between  the  symbols  s*-,  sj  E  S  is  denoted  as 
j),  i,j  =  0, . . . ,  N9  —  1.  The  channel  C  is  composed  of 
Nc  symbols  c,-,  i  =  0, . . . ,  Nc  —  1.  The  symbol  c;  E  O  occurs 
with  probability  Pc{i)  and  has  an  energy  of  Ec{i).  This  re¬ 
sults  in  an  average  energy  of  ^^c~1  Pc(i)Ec(i)  at  the  channel 
input.  The  transition  probabilities  of  the  channel  are  denoted 
as  Tc{j\i). 

The  encoder  provides  a  mapping,  denoted  as  £,  from  the  set 
of  source  symbols  to  the  set  of  channel  symbols  such  that  the 
zth  source  symbol,  i  =  0, . . . ,  N3  —  1,  is  mapped  to  the  chan¬ 
nel  symbol  indexed  by  £(i)  E  [0,  Nc  -  1].  Each  source  symbol 
is  encoded  to  a  specific  channel  symbol,  however,  (i)  several 
source  symbols  may  be  encoded  to  the  same  channel  symbol, 
and  (ii)  some  of  the  channel  symbols  may  not  be  used.  The 
decoder  provides  a  mapping,  denoted  as  77,  from  the  set  of 
channel  symbols  to  the  set  of  source  symbols  such  that  the 
ith  channel  symbol,  i  =  0, . . . ,  Nc  —  1,  is  mapped  to  the  source 
symbol  indexed  by  r)(i)  E  [0,  Ns  —  1],  Each  channel  symbol  is 
decoded  to  a  specific  source  symbol,  however,  several  channel 
symbols  may  be  decoded  to  the  same  source  symbol. 

Our  objective  is  to  optimize  the  two  mappings,  namely  £,  77, 
to  minimize  the  average  distortion  between  the  encoder  input 
and  the  decoder  output.  The  introduced  formulation  opti¬ 
mizes  the  combined  effects  of  source  quantization  and  chan¬ 
nel  coding  on  the  end-to-end  distortion.  Quantization  of  the 
source  symbols  occurs  when  several  source  symbols  are  en¬ 
coded  to  the  same  channel  symbol.  Channel  coding  occurs 
when  some  of  the  channel  symbols  are  not  used  at  all.  In 
the  following,  this  optimization  problem  is  formulated  as  a 
zero-one  program. 

1This  work  was  supported  by  Natural  Sciences  and  Engineering 
Research  Council  of  Canada  (NSERC). 


II.  Zero-one  formulation  of  the  problem 

We  assign  an  Nc  dimensional  binary  vector  to  each  symbol 
of  the  source  at  the  channel  input.  The  vector  correspond¬ 
ing  to  the  ith  source  symbol,  i  —  0, . . . ,  Ns  —  1,  is  denoted  as 
et-  =  [e,-j,  j  =  0, . . . ,  Nc  —  1].  We  impose  the  constraints  that 
eij  €  {0, 1}  and  e,'i  —  U  Vi.  If  the  ith  source  symbol  is 

encoded  to  the  Ith.  channel  symbol,  we  set,  dj  =1,  j  ~l  and 
dj  =0,  j^l.  Similarly,  we  assign  an  Ns  dimensional  binary 
vector  to  each  channel  symbol  at  the  decoder  side.  The  vector 

corresponding  to  the  j th  channel  symbol,  j  =  0, _ ,  Nc  —  1,  is 

denoted  as  dj  =  [dij,  i  —  0, . . . ,  Ns  —  1].  We  impose  the  con¬ 
straints  that  dij  E  {0, 1}  and  1  =  1?  Vj.  If  the  jth 

channel  symbol  is  decoded  to  the  Ith.  source  symbol,  we  set 
dij  =  1,  i  =  i  and  dij  =0,  Using  these  notations,  the  op¬ 

timization  problem  is  formulated  as: 


This  optimization  problem  is  transformed  into  a  Bilinear 
Programming  Problem  (BLP  Problem)  [l].  The  problem  has 
some  special  features  which  substantially  facilitates  its  solu¬ 
tion.  These  features  are:  (i)  Existence  of  the  Generalized 
Upper  Bounding  (GUB)  constraints  for  both  encoder  and  de¬ 
coder.  (ii)  The  encoder  structure  has  only  one  extra  con¬ 
straint  in  addition  to  the  GUB’s,  namely  the  energy  constrain, 
(iii)  The  decoder  constraints  are  all  GUB’s  and  consequently 
the  linear  program  involved  in  the  optimization  of  the  decoder 
is  decomposable.  Using  these  features,  an  efficient  method 
based  on  a  variant  of  the  simplex  search  is  presented  for  the 
solution. 
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Abstract  —  We  address  the  problem  of  design¬ 
ing  a  coded  modulation  scheme  for  the  fading  chan¬ 
nel  when  space  diversity  is  used.  We  focus  on 
the  fact  that  a  channel  affected  by  fading  can  be 
asymptotically  turned  into  an  additive  white  Gaus¬ 
sian  noise  (AWGN)  channel  by  increasing  the  number 
of  diversity  branches,  thus  turning  coded-modulation 
schemes  designed  for  the  AWGN  channel  into  efficient 
codes  over  the  fading  channel. 

I.  Introduction 

The  severe  performance  degradation  effects  associated 
with  flat  fading  in  radio  channels  are  well  known.  Similarly 
well  known  is  the  fact  that  when  coping  with  fading  an  al¬ 
ternative  option  to  increased  power  is  the  use  of  multiple- 
receiver  techniques  categorized  under  the  name  of  diversity. 
Recently,  coded  modulation  has  been  regarded  as  a  way  of 
introducing  time  diversity.  Actually,  the  effect  of  increasing 
the  Hamming  distance  between  pairs  of  possible  symbol  se¬ 
quences  transmitted  over  the  flat  Rayleigh-fading  channel  is 
the  same  as  induced  by  increasing  the  number  of  branches  in 
space  diversity.  One  problem  with  this  approach  is  that  the 
design  criteria  for  coded  modulation  schemes  in  fading  chan¬ 
nels  differ  from  the  standard  minimum-Euclidean-distance 
criterion  valid  for  the  AWGN  case.  Consequently,  a  code  op¬ 
timal  for  the  AWGN  channel  may  perform  poorly  on  a  fading 
channel  and  vice  versa. 

We  study  the  synergy  of  space  diversity  and  code  diver¬ 
sity.  In  partucular,  we  focus  our  analysis  on  the  fact  that 
antenna  diversity  and  maximal-ratio  combining  have  the  ef¬ 
fect  of  turning  the  equivalent  transmission  channel  into  an 
AWGN  channel.  A  structure  of  a  receiver  with  constant  total 
gain  is  advocated.  With  it,  when  the  space-diversity  order  is 
M  the  energy  per  diversity  branch  is  decreased  by  a  factor  of 
My  so  that  the  average  signal -to -noise  ratio  at  the  decoder  in¬ 
put  remains  the  same  irrespective  of  the  diversity  order.  In 
practical  terms,  we  might  think  of  an  antenna  array  whose 
number  of  elements  is  increased  without  increasing  the  to¬ 
tal  area,  so  that  the  equivalent  gain  of  the  antenna  is  kept 
constant.  With  this  receiver,  at  no  additional  cost  in  terms 
of  antenna  size,  an  optimal  code  for  the  AWGN  channel  can 
achieve  (asymptotically  as  the  number  of  diversity  branches 
increases)  the  same  optimal  performance  on  a  fading  chan¬ 
nel,  irrespective  of  the  fading  parameters.  This  asymptotic 
performance  is  approached  with  only  a  few,  highly  correlated 
diversity  branches. 

The  following  results  were  obtained: 

•  Bounds  on  the  bit  error  probability  of  a  coded  modu¬ 
lated  system  with  diversity,  including  branch  correla¬ 
tion. 

^his  research  was  sponsored  in  part  by  the  Human-Capital  and 
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•  The  cut-off  rate  of  the  diversity  channel. 

•  Simulation  results  based  on  simple  coded  modulation 
schemes  for  8-PSK  with  several  detection  strategies. 

•  Rate  of  convergence  of  the  fading  channel  to  an 
AWGN  channel  as  the  number  of  diversity  branches 
increases. 

In  the  following  we  describe  the  latter  results. 

II.  Convergence  to  AWGN  channel 
With  antenna  diversity,  coherent  detection  and  perfect 
channel-state  information  the  convergence  to  AWGN  chan¬ 
nel  is  very  quick. 

We  observe  that  the  divergence  of  the  channel  with  di¬ 
versity  from  a  channel  without  fading  is  due  to  the  combi¬ 
nation  of  two  factors,  namely,  divergence  from  Gaussianity 
and  a  larger  value  of  the  noise  variance.  While  the  conver¬ 
gence  to  a  channel  in  which  fading  is  simply  wiped  out  is 
important,  the  sheer  convergence  of  the  total  disturbance  to 
a  normal  distribution  (even  with  a  slightly  larger  variance) 
implies  that  coding  schemes  that  have  been  optimized  for  a 
Gaussian  channel  will  perform  closer  and  closer  to  optimal¬ 
ity.  Convergence  can  be  studied  by  examining  the  Kullback- 
Leibler  distance  of  the  probability  density  function  f(x )  of 
the  total  disturbance  (fading  plus  noise)  from  a  normal  dis¬ 
tribution  with  a  variance  equal  to  that  of  the  noise,  which 
we  denote  here  by  g(x ),  and  from  a  normal  distribution  with 
larger  variance,  denoted  g'(x).  The  results  obtained  are  re¬ 
ported  in  Table  1. 


M 

D(f  ||  g) 

D(f  ||  g’) 

2 

0.474 

0.167 

3 

0.154 

0.060 

4 

0.076 

0.031 

5 

0.045 

0.018 

6 

0.030 

0.012 

7 

0.021 

0.009 

8 

0.016 

0.007 

9 

0.012 

0.005 

10 

0.010 

0.004 

Table  1:  Kullback-Leibler  distance  of  distributions  f(x)  and 
g(x)  and  of  distributions  f(x)  and  g{x). 
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Abstract  —  This  paper  presents  a  bandwidth 
efficient  multilevel  concatenated  coded  modulation 
scheme  for  reliable  data  transmission  over  the  shad¬ 
owed  mobile  satellite  communication  (MS AT)  chan¬ 
nel.  In  this  scheme,  bandwidth  efficient  block  mod¬ 
ulation  codes  are  used  as  the  inner  codes  and  Reed- 
Solomon  codes  of  various  error  correcting  capabilities 
are  used  as  the  outer  codes.  The  inner  and  outer 
codes  are  concatenated  in  multiple  levels.  A  system¬ 
atic  method  for  constructing  multilevel  concatenated 
modulation  codes  is  presented  and  a  multistage  clos¬ 
est  coset  decoding  for  these  codes  is  proposed.  Spe¬ 
cific  multilevel  concatenated  8-PSK  modulation  codes 
have  been  constructed.  These  codes  are  designed 
to  remove  the  error  floor  phenomenon  or  lower  the 
bit-error  rate  of  the  error  floor  caused  by  the  large 
Doppler  frequency  shift  due  to  the  motion  of  vehicles. 
Simulation  results  show  that  these  codes  perform  very 
well  and  achieve  large  coding  gains  over  the  uncoded 
reference  modulation  systems. 

I.  Introduction 

In  this  paper,  we  propose  and  investigate  multilevel  concate¬ 
nated  coded  modulation  schemes  for  shadowed  MSAT.  chan¬ 
nel.  A  statistical  model  for  the  shadowed  mobile  satellite 
channel  has  been  devised  by  Loo  [1].  This  model  has  been 
used  by  other  researchers  [2,  3]  to  study  error  performances  of 
coded  modulation  schemes  over  the  shadowed  mobile  satellite 
communication  channel.  In  the  Loo’s  model,  there  are  three 
different  kinds  of  shadowing,  i.e.,  light,  average  and  heavy. 
The  corresponding  Rician  factors  are  6.16,  5.46  and  -19.33  dB, 
respectively.  Therefore,  in  the  heavy  shadowed  MSAT  chan¬ 
nel  which  is  statistically  close  to  the  Rayleigh  fading  channel, 
a  coded  modulation  system  suffers  very  severe  distortion  due 
to  randomly  changing  phase  and  the  multipath  fading.  Es¬ 
pecially,  if  the  Doppler  frequency  shift  is  large  due  to  the 
motion  of  vehicle,  a  coded  modulation  system  faces  the  error 
floor  phenomenon. 

II.  Multilevel  Concatenated  BCM  Schemes 

Coded  modulation  in  conjunction  with  concatenation  is  a  pow¬ 
erful  technique  for  achieving  high  reliability,  large  coding 
gain,  and  high  spectral  efficiency  with  reduced  decod¬ 
ing  complexity.  This  combination  of  coded  modulation  and 
concatenation  is  known  as  concatenated  coded  modula¬ 
tion  [4].  Error  performance  of  the  single-level  concatenated 
TCM  and  BCM  schemes  for  the  Rayleigh  fading  channel  was 
investigated  by  Vucetic  and  Lin  in  1991  [5].  All  these  studies 

^This  research  was  supported  by  NSF  Grants  NCR-91-1540  and 
NCR  94-15374  and  NASA  Grant  NAG  5-931. 


showed  that  by  properly  choosing  the  inner  codes  and  outer 
codes,  large  coding  gains  and  high  spectral  efficiency  could  be 
achieved  with  reduced  decoding  complexity. 

However,  a  major  shortcoming  of  a  single-level  concate¬ 
nated  coded  system  with  multilevel  block  modulation  code  as 
the  inner  code  is  that  the  outer  code  corrects  all  the  output 
bits  of  the  inner  code  decoder  to  the  same  degree.  Since  a  mul¬ 
tilevel  modulation  code  is  constructed  from  component  codes 
with  different  distance  profiles,  multistage  decoding  results  in 
different  bit-error  probabilities  for  different  component  codes 
at  the  output  of  the  inner  code  decoder.  As  a  result,  the 
overall  error  performance  of  a  single-level  concatenated  coded 
modulation  system  is  dominated  by  the  worst  bit-error  prob¬ 
ability  of  the  component  code  of  the  modulation  inner  code. 
To  improve  the  overall  error  performance,  it  is  necessary  to 
provide  different  levels  of  error  protection  for  different  inner 
component  codes  in  a  concatenated  coded  modulation  sys¬ 
tem.  One  approach  to  this  improvement  is  to  use  multilevel 
concatenation  with  multiple  outer  codes  to  provide  different 
levels  of  error  protection  for  different  inner  component  codes. 
Multilevel  concatenation  provides  the  flexibility  of  choosing 
outer  codes  with  different  error  correcting  capabilities  and  fur¬ 
thermore  improves  the  spectral  efficiency  over  the  single-level 
concatenation  scheme. 

Simulation  results  show  that  these  codes  achieve  very  im¬ 
pressive  real  coding  gains  over  the  uncoded  reference  system 
and  single-level  concatenated  BCM  codes  using  the  same  inner 
codes. 
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I.  INTRODUCTION 

Signalling  over  Rayleigh  fading  channels  can  be  classed  as  a  general 
Gaussian  problem.  Optimal  linear  filtering  can  then  be  applied  to  joint¬ 
ly  estimate  the  channel  and  detect  the  information  sequence  [1  ].  For 
fading  channels  with  non-Gaussian  distributions,  optimal  linear  filter¬ 
ing  does  not  necessarily  yield  the  best  channel  estimates.  To  exploit  the 
channel  memory,  a  first  order  finite  Markov  chain  model  (HMM)  that 
statistically  characterizes  the  Nakagami-m  fading  process  is  used  to 
aid  the  channel  estimation.  Based  on  this,  a  maximum  a  posteriori 
(MAP)  receiver  using  coherent  detection  is  presented  for  binary  PAM 
signals. 

II. SYSTEM  MODEL  AND  THE  BRANCH  METRIC 

The  system  model  is  shown  in  Fig.  1  where  g(t)  is  the  multiplicative 
Nakagami  fading  process.  A  first  order  finite  state  Markov  chain  model 
for  g(t)  can  be  derived  using  the  procedure  described  in  [2]. 

/V 


Fig.  1  System  Model 

The  shaping  filter,  is  selected  so  that  when  the  received  signal 
is  sampled  with  interval  Ts ,  no  intersymbol  interference  will  occur.  A 
trellis  can  be  set  up  for  this  receiver  where  its  states  are  the  states  of  the 
Markov  chain  model.  The  branch  metric  for  the  trellis  search  is: 

Uk-gk  bAf( 0)  ]2  -  2a\  In  Pr (**  I  ) 

where  {bj}  is  the  equivalent  information  sequence  and  o\  is  the  vari¬ 
ance  of  the  noise  that  accounts  for  both  the  additive  white  Gaussian 
noise  and  the  modelling  error  of  the  fading  process.  The  last  term  ac¬ 
counts  for  the  state  transition  probability  of  the  Markov  chain. 

III.  SIMULATION  RESULTS 

Simulation  has  been  done  for  binary  PAM  with  coherent  detection. 
The  Nakagami  fading  process  is  generated  from  a  correlated  Gaussian 
process  which  in  turn  is  generated  by  passing  a  white  Gaussian  process 
through  a  second  order  low  pass  Butterworth  filter  whose  cutoff  fre¬ 
quency  determines  the  rate  of  fading.  An  8-state  Markov  chain  model 
is  used  to  represent  the  Nakagami  fading  process. 

Fig.  2  shows  the  error  performance  for  the  MAP  receiver  for  m=0.5 
and  5.0.  The  bandwidth  of  the  Butterworth  filter  is  chosen  to  be  0. 1  Hz, 
which  corresponds  to  fast  fading.  For  comparison,  the  error  perform¬ 
ance  for  receivers  where  the  LMS  algorithm  is  used  to  estimate  the 
channel  is  also  given.  Fig.  3  shows  the  error  performance  for  two  differ¬ 


ent  fading  rates  with  m=2.0.  One  can  observe  from  the  figures  that  the 


Fig.  2  Error  Performance  for  Fast  Fading 


Fig  .  3  Error  Performance  for  Different  Fading  Rates,  m=2.0 

MAP  receivers  using  the  Markov  chain  model  give  somewhat  better  er¬ 
ror  performance  for  medium  to  high  average  SNR  than  receivers  which 
employ  optimal  linear  filtering  to  estimate  the  channel.  Also  the  error 
performance  of  the  MAP  receiver  is  not  sensitive  to  the  fading  rates  for 
a  fixed  value  of  m.  Simulations  have  also  been  done  for  m=1.0,  10.0 
and  20.0,  under  different  fading  rates.  The  improvement  of  the  error 
performance  using  the  Markov  model  becomes  more  significant  as  m 
and/or  the  fading  rates  increase.  The  method  is  readily  extended  to  fre¬ 
quency  selective  fading  channels  with  non-Gaussian  distributions 
whereas  MLSE  receiver  proposed  in  [  1  ]  is  difficult  or  impossible  to  im¬ 
plement. 
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Abstract  -  In  this  paper,  a  burst  error  process  is 
characterized  by  three  variables  x,y,z  related  to 
a  two-state  Gilbert-Elliott-Channel  with  fifty  per¬ 
cent  bit  error  rate  in  the  bad  state.  The  variables 
x  and  y  describe  the  Markov  process  of  the  model. 
The  variable  z  reflects  the  mean  bit  error  rate.  On 
such  a  channel,  the  probability  of  any  single  er¬ 
ror  sequence  and  hence  any  collection  thereof  can 
be  represented  by  polynomials  in  x,  y ,  z  which  ex¬ 
tends  the  one- variable  description  in  z  applicable 
for  statistically  independent  errors. 

I.  Burst  Error  Statistics 

On  a  binary  channel,  an  n-bit  error  sequence  w  = 
eie2  ■  ■  -en,eve  {0,1},  can  be  considered  as  elementary 
error  event.  There  are  N  =  2”  distinct  elementary  events 
Wi,i  =  0,1,  -  ■■  ,N  —  1;  each  occurring  with  probability 
Pi  =  P(v>i)  =  Prob(wi).  A  composite  error  event  E  is 
given  by  the  union  of  constituting  elementary  error  events 
Wi  characterized  by  an  appropriate  index  set  Ie,  i-e- 

E  =  JJ  wi  ,  PE  =  Prob(E)  =  Prob(wi).  (1) 
ieis  ^Ie 

Pertaining  burst  error  statistics  Pe  are,  among  oth- 
ers,  the  error  weight  distribution  P(m,n),  and  the  error 
correlation  function  R(t).  P(m,  n )  is  the  probability  of 
m  errors  in  a  block  of  n  bits;  R(t)  =  Prob(leT  11)  is  the 
probability  of  two  errors  occurring  at  a  distance  r . 


II.  Burst  Error  Model 


The  burst  error  process  is  modelled  by  a  two-state 
Gilbert-Elliott-Channel  (GEC),  characterized  by  the  bit 
error  rates  pg  and  pg  associated  to  the  good  state  G  and 
the  bad  state  B,  and  the  state  transition  probabilities 
P  -  Prob(B\G)  and  Q  =  Prob(G\B),  resp.  The  mean  bit 
error  rate  is  pb  =  Pg  p+q  +PB  p+q'  ^  reduced  GEC  with 
pB  =  0.5  will  be  applied.  The  three  remaining  parameters 
pG ,  P,  Q  are  expressed  by  x  —  P/Q,  y  =  1  —  (P  +  Q ),  and 


P0  =  P( 000) 
Pi  =  P(001) 
P2  =  P(010) 
P3  =  P(011) 
P4  =  P(100) 
P5  =  P(101) 
P6  =  P(  110) 
P7  =  P(111) 


Fig.  1:  Trellis  for  evaluating  P(u>j) 


z  =  l-2pb.  See  [2]  for  physical  interpretation.  Describing 
matrices  are  x  f1  +  xy  x  _  xy] 

Do  +  Di  — 


D  = 


Do  —  Di  —  z 


1  +  z  [  \-y  x  +  y J  ’ 

14 -  xy  x  -  xy 

0  0 


(2) 

(3) 


The  stationary  state  distribution  is  <ro  —  • 


III.  Polynomial  Representations 

The  product  formalism  of  stochastic  automata  theory 

P(w)  =  <7-0DeiDe2  •  •  •De„eT,eT  =  (1, 1,  •  •  •,  1)T  (4) 

can  be  used  to  show  by  complete  induction  that  P(w)  is 
indeed  a  polynomial  in  x,y,  z.  Generalizing  (4)  yields 


R(t)  =  <r0DiDT-1D1eT, 

=  i  [1  +  2z  +  (1  +xyT)z2}. 


(5) 


Applying  modal  analysis  [1,2],  the  probability  vector 
P  =  [P(wi)]  can  be  expressed  via  Walsh-Hadamard- 
Transformation  of  the  spectral  coefficient  vector  Q  = 
[Q(id,)],  where  Q,  =  Q(wi)  are  simple  polynomials  in 
x,y,z. 

V„_i  V„_i 
Vn-i 


P  =  2-”QVn,Vn  = 


,V0  =  [1]  (6) 


For  n  -  3,  evaluation  trellises  are  shown  in  Fig.  1,  2. 
As  the  Hadamard  matrix  V„  consists  of  entries  +1  and 
-1,  resp.,  P(wi)  and  hence  Pe  consist  of  appropriate 
aggregats  of  Q(w;)>  e-S- 

P(2, 3)  =  i  [3  -  3z  -  2(1  +  xy)z2  (7) 

O 

—  (1  +  xy2)z2  +  3(1  +  xy)2  z3}. 
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Abstract  —  We  propose  a  new  technique  for  coherent 
transmission  over  multipath  Rayleigh  fading  channels, 
based  on  the  use  of  one  special  case  of  time-frequency 
well-localized  orthonormal  functions,  namely  the  Pro¬ 
late  Spheroidal  Wave  Functions  (PSWF).  Acceptable 
SER  performances  are  obtained  until  values  of  about 
0.1  of  the  channel’s  spread  factor. 

I.  Introduction 

Coherent  signaling  over  very  dispersive  Rayleigh  fading  chan¬ 
nels  is  quite  a  challenging  task.  A  classical  rule  of  thumb  to 
respect  in  such  cases  is  to  choose  a  signaling  symbol  time  in¬ 
terval  T  verifying  Tm  <  T  <  1  jBd,  where  Tm  denotes  the 
time  spread  due  to  multipath  propagation,  and  Bd  denotes 
the  Doppler  spread  bandwidth  due  to  individual  path’s  enve¬ 
lope  fading.  This  is  indeed  possible  when  the  channel’s  spread 
factor  L  =  TmBd  is  very  small  (X  <  0.01).  Excellent  results 
have  been  achieved  in  such  situations  by  the  use  of  pilot  sym¬ 
bols  in  association  with  coded  modulation  [l]-[2].  When  L 
approaches  unity,  any  attempt  to  make  T  <  1  jBd  will  result 
in  severe  multipath  spreading. 

In  our  work  we  consider  coherent  signaling  over  channels  with 
spread  factors  TmBd  <  0.1.  Our  technique  permit  coherent 
detection  in  situations  traditionally  reserved  to  non-coherent 
reception,  with  the  evident  benefit  of  higher  spectral  efficiency. 

II.  Time-Frequency  Orthonormal  Bases 

Time-fequency  localization  operators  are  of  interest  for  many 
applications.  A  well  known  example  of  such  operators  is 
the  one  presented  by  Slepian  and  Poliak  [3].  In  an  exten¬ 
sive  serie  of  articles  ([3]  and  other)  they  studied  the  prop¬ 
erties  of  signal  band-  and  time-limiting  operators  on  the 
[— T/2,  T/2]  x  [— W,  W]  rectangle.  They  demonstrated  that 
the  orthonormal  family  of  Prolate  Spheroidal  Wave  Functions 
(PSWFs)  is  a  complete  basis  of  singular  functions  for  the 
above-mentioned  operator. 

Another  example  of  such  operators  is  the  case  of  the  Her- 
mite  polynomial  functions  which  are  the  eigenfunctions  of 
the  projection  operator  on  disks  of  the  time-frequency  plane 
(t2  fw2  <  R2)  [4]. 

III.  Simultaneous  Data  and  Pilot  Symbols 

In  this  section  we  investigate  the  performances  of  a  transmis¬ 
sion  scheme  well  fitted  to  severely  dispersive  Rayleigh  fading 
channels  ( L  ranging  from  about  0.01  to  0.1). 

When  the  spread  factor  of  the  channel  exceeds  0.01,  solutions 
based  on  the  transmission  of  frames  of  N  symbols  with  the 
first  symbol  being  the  pilot  and  the  N  —  1  remaining  symbols 

1The  work  of  E.  Bejjani  is  supported  by  SETICS  (Societe 
d’Etudes  en  Telelnformatique  et  Communications  Systemes), 
within  a  PhD  thesis  contract  in  association  with  ENST  (Ecole  Na¬ 
tional  Superieure  des  Telecommunications). 


carrying  the  data  [2]  do  not  work.  In  our  approach  we  ex¬ 
ploit  the  orthogonality  of  the  PSWFs  in  order  to  transmit  in 
the  same  time  interval  T  both  pilot  and  information  symbols. 
This  is  accomplished  by  simultaneous  transmission  of  differ¬ 
ent  orthogonal  PSWFs,  one  among  these  carrying  the  pilot 
symbol.  Fig.l  shows  computer  simulated  performances  of  the 
proposed  technique  for  several  values  of  the  spread  factor  L. 
It  confirms  that  for  L  <  0.01,  the  performance  of  our  tech¬ 
nique  is  comparable  with  that  obtained  in  [2],  Moreover,  we 
notice  that  the  effect  of  the  extending  multipath  spread  is  the 
presence  of  a  floor  of  the  SER. 

When  the  channel’s  spread  factor  exceeds  0.01,  the  advantage 
of  our  method  is  the  graceful  degradation  of  the  performances 
in  contrast  with  the  impossibility  of  implementing  the  method 
described  in  [2]. 

Results  for  the  QPSK  modulation  (not  shown)  are  only  a  few 
percent  worse  than  those  of  the  BPSK. 

IV.  Conclusion 

We  showed  the  feasibility  of  coherent  detection  for  multipath 
fading  channels  with  a  spread  factor  reaching  0.1.  It  is  possi¬ 
ble  to  achieve  coherent  transmission  with  even  higher  spread 
factors,  but  on  condition  that  additional  processing  is  used. 
We  are  presently  working  in  this  direction. 
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I.  Introduction 

It  has  been  known  for  some  time  [1,2]  that  coded  modulation 
schemes  such  as  Ungerboeck‘s  [3]  which  are  optimised  for  the 
Gaussian  channel  do  not  perform  well  on  fading  channels,  and 
especially  on  the  Rayleigh  fading  channel.  Ungerboeck's  codes 
maximise  minimum  Euclidean  distance  between  coded  sequences; 
Divsalar  and  Simon  identified  a  number  of  parameters  that  should 
be  maximised  in  preference,  notably  the  minimum  Hamming 
distance  and  the  product  distance.  More  recently  it  has  been 
suggested  [4]  that  the  framework  of  multilevel  coded  modulation 
(MCM)  forms  a  suitable  basis  for  the  design  of  such  codes,  since  it 
enables  the  Hamming  distance  readily  to  be  maximised.  This  has 
the  further  advantage  [5]  that  decoders  may  be  implemented  using 
readily-available  ASICs  for  binary  convolutional  codes. 

In  most  of  this  work  the  aim  has  been  to  optimise  asymptotic 
performance  at  high  signal  to  noise  ratio  (SNR),  and  the  choice  of 
code  parameters  has  been  made  accordingly.  However,  it  is  well- 
known  (e.g.  [6])  that  this  may  not  optimise  performance  at  practical 
SNR.  This  paper  presents  a  design  technique  to  minimise  required 
SNR  performance  for  a  specific  target  bit  error  ratio  (BER).  It  also 
describes  a  new  simplified  bounding  technique  for  the  BER  of 
MCM  on  a  Rayleigh  fading  channel,  which  avoids  the  use  of 


III.  Design  and  performance  of  optimum  codes 

Using  this  technique  we  calculate  stage  BERs  for  a  given  scheme. 
The  overall  BER  is  then  the  sum  of  these.  It  has  been  noted  [4]  that 
many  MCM  codes  have  overall  BER  dominated  by  one  stage  of 
decoding.  Here  we  apply  the  principle  of  equalising  stage  BERs  at 
a  given  SNR,  hence  optimising  the  codes  for  this  SNR. 

This  method  has  been  applied  to  optimise  8-PSK  codes  for  BER  10 
3  and  10“ 6 .  Codes  with  encoder  memory  6  and  rates 
{Rhi  =  1..3}  =  {%.%.%}  and  {%%.%}  respectively  were 
selected  by  means  of  an  iterative  procedure  which  equalised  stage 
BERs  at  the  required  SNR.  For  code  rates  other  than  ]/k ,  punctured 
codes  are  used. 

Fig.  2.  compares  the  overall  BERs  of  these  codes  with  a  code  with 
equal  Hamming  distances  on  each  level  [4].  It  can  be  seen  that 
there  is  a  significant  performance  improvement  over  the  equal 
Hamming  distance  code,  of  about  2.3  dB  for  BER  10  This  code 
also  improves  on  the  Schlegel  and  Costello  code  with  the  same 
memory  [2]  by  over  4  dB  at  this  BER.  However,  asymptotically  the 
codes  described  here  have  much  poorer  performance. 

IV.  Conclusions 


Chemoff  bounds. 

n.  Bounds  on  BER  of  MCM  on  a  Rayleigh  channel 

The  principle  of  multistage  decoding  of  MCM  is  to  decode  each 
partition  of  the  signalling  constellation  separately,  treating  the 
remaining  partitions  as  uncoded.  This  is  of  course  sub-optimum.  It 
allows  us,  however,  to  treat  each  stage  of  the  decoding  process  as 
binary.  We  may  then  use  analytical  expressions  for  the  BER  of 
binary  signalling  on  a  Rayleigh  fading  channel  [7].  We  treat  a 
binary  code  with  minimum  free  distance  d  as  binary  signalling  with 
d  branch  diversity,  and  use  a  union  bounding  technique  similar  to 
that  described  in  [8].  From  [7]  p.  474,  the  BER  of  binary  signalling 
with  d-branch  diversity  is: 

?(*« ..(1, 


where  fj.  =  -J(Ec/Nq)/(\  +  Ec/Nq) 


Following  [6],  we  define  the  stage  BER  Pi  as  the  error  probability 
in  all  stages  due  to  errors  at  the  Ith  stage,  thus  including  error 
propagation,  allowed  for  in  the  factor  £[.  Note  that,  unlike  [4],  we 
do  not  assume  interleaving  between  stages.  Then  following  [8]  we 
calculate  an  estimate  of  Pi  taking  into  account  erroneous  paths  up 
to  Hamming  distance  dmax.  Suitable  values  for  dn ^  are  found  by 


comparison  with  simulations. 
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where  =  1  +  /?t+i/2/?{  ,i*  =  l,2;e3  =  1 

In  an  MCM  decoder  the  error-weighted  distance  spectrum  Ad  of  the 
code  must  be  multipled  by  the  factor  a  where 
[a  • ,  i  =  1. .  3}  =  {2, 2, 1}  is  the  number  of  neighbouring  points  in  the 
signalling  constellation  partition  at  each 
level..{A,-,i  =  1..3}  =  {2sin(rc/8),v/2,2}  is  the  partition 
minimum  distance  at  each  stage. 


We  have  presented  codes  for  the  Rayleigh  fading  channel  optimised 
for  given  finite  error  rate  that  achieve  significant  improvements  in 
coding  gain  on  previously -described  codes.  These  codes  are  based 
on  multilevel  coded  modulation,  and  hence  can  be  decoded  using 
multistage  decoding,  using  readily-available  Viterbi  decoders.  The 
use  of  punctured  codes  also  makes  for  a  very  flexible  structure,  in 
which  codes  of  different  overall  rates,  and  optimised  for  different 
BERs,  may  readily  be  implemented. 


Fig.  2.  Comparison  of  overall  BER  for  the  two  optimised  codes  (a,b)  and  the  equal 

Hamming  distance  code  (c) 

References 

1.  Divsalar,  D.  and  Simon,  M.  K.  "The  design  of  trellis  coded  MPSK  for  fading 
channels:  performance  criteria"  IEEE  Trans.  Common,  vol.  36,  pp.  1004-1012, 
September  1988 

2.  Schlegel,  C.  and  Costello,  D.  J.  "Bandwidth  efficient  coding  for  fading  channels: 
code  construction  and  performance  analysis"  IEEE  J .  Select.  Areas  Common,  vol. 
7,  pp  1356-1368,  December  1989 

3.  Ungerboeck,  G.  "Channel  coding  with  multilevel/phase  signals"  IEEE  Trans. 
Inform.  Theory ,  vol.  28,  pp  55-67 ,  January  1982 

4  Seshadri,  N.  and  Sundberg,  C-E.  W.  "Multilevel  coded  modulations  for  fading 
channels1,  Proc.  5th  Ini .  Tirrenia  Workshop  (Elsevier,  1 992),  pp.  305-  316 

5.  Viterbi,  A.  J.  et  al  "A  pragmatic  approach  to  trellis-coded  modulation"  IEEE 
Communications  Magazine,  July  1989 

6  Burr  A  G.  and  Lunn  T.  J:  "Code  optimisation  for  finite  error  rate"  Proc.  IEEE 
Symposium  on  Information  Theory ,  San  Antonio,  Texasjanuary  1993,  p  67 

7.  Proakis,  J.  G.  "Digital  Communications"  McGraw-Hill,  1983 

8.  Burr,  A.  G.  "Bounds  and  approximations  for  the  bit  error  probability  of 
convolutional  codes"  Electronics  Letters,  vol.  29,  July  1993,  pp  1287-88 


213 


Distributed  Reception  of  Fading  Signals  in  Noise 

Rick  S.  Blum 

Electrical  Engineering  and  Computer  Science  Department, 

Lehigh  University,  Bethlehem,  PA  18015 


Abstract  A  multiple  antenna  diversity  scheme  is 
investigated  for  digital  wireless  communications.  In 
this  scheme  the  antenna  observations  are  immediately 
quantized  and  only  the  quantized  values  are  sent  to  a 
fusion  center  to  decide  which  symbol  was  transmitted. 
The  case  where  fine  quantization  is  impractical  is  con¬ 
sidered,  so  that  distributed  detection  principles  apply. 
The  optimum  reception  scheme  is  described  for  the 
case  where  frequency  shift  keying  is  employed.  Mul¬ 
tiple  bit  quantization  schemes  are  considered  for  cases 
where  the  observations  at  each  antenna  are  influenced 
by  slow  Rayleigh  fading  and  Gaussian  additive  noise. 
Some  numerical  results  are  provided. 

I.  Introduction 

There  is  significant  interest  in  using  wireless  communication 
systems  in  environments  where  severe  multipath  fading  and 
co-channel  interference  is  present,  which  can  limit  system  per¬ 
formance  [1].  To  mitigate  the  effects  of  multipath  fading  and 
co-channel  interference,  diversity  techniques  using  multiple  re¬ 
ceive  and  transmit  antennas  have  been  proposed  [1]  and  it  has 
been  found  that  the  performance  improvements  obtained  by 
using  these  schemes  can  be  significant. 

There  appears  to  be  a  trend  towards  increasing  the  por¬ 
tion  of  wireless  receivers  that  are  implemented  using  digital 
technology  in  many  applications.  Recent  improvements  in 
electronic  technology  indicate  that  all  digital  receivers  are  be¬ 
coming  practical  at  many  frequencies  of  interest  and  further 
improvements  in  the  speed  of  analog-to-digital  converters  are 
expected  to  continue  this  trend.  These  facts  indicate  that 
multiple  antenna  diversity  schemes  that  combine  quantized 
samples  should  be  considered,  as  illustrated  in  Figure  1. 


Figure  1:  Distributed  diversity  combining. 

In  Figure  1,  each  individual  receiver  makes  a  multi-bit  de¬ 
cision  about  which  symbol  was  sent  based  only  on  the  ob¬ 
servations  available  at  the  co-located  antenna.  These  deci¬ 
sions  are  then  transmitted  to  a  single  location  where  a  final 
decision  is  made.  This  is  equivalent  to  a  distributed  signal 
detection  problem.  Two  studies  of  diversity  schemes  based 

°This  material  is  based  upon  work  supported  by  the  National 
Science  Foundation  under  Grant  No.  MIP-9211298 


on  combining  single  bit  decisions  made  at  several  antennas 
have  been  reported  [2,  3].  More  recently,  we  investigated  the 
optimum  design  of  multi-bit  decision  schemes.  If  the  quanti¬ 
zations  produce  samples  with  enough  bits  of  resolution  then 
the  entire  scheme  will  closely  resemble  the  diversity  schemes 
considered  for  analog  receiver  implementations  [1],  which  in¬ 
cludes  the  majority  of  research  in  this  area.  However,  based 
on  our  recent  results,  it  appears  that  it  is  not  always  necessary 
to  use  such  high  resolution  quantizations.  Using  course  quan¬ 
tizations,  with  only  a  few  bits  resolution,  could  reduce  cost 
and  complexity  considerably.  Our  recent  work  indicates  that 
course  quantizations  can  sometimes  be  used  without  notice¬ 
able  loss  in  performance  provided  one  uses  the  proper  quan¬ 
tizer  designs. 

We  considered  cases  with  independent  fading  (and  noise) 
from  antenna  to  antenna,  a  case  of  considerable  interest  [l]. 
Further,  we  considered  a  communications  system  where  non¬ 
coherent  binary  frequency  shift  keying  (FSK)  is  to  be  em¬ 
ployed.  Assume  that  N  receivers,  each  with  an  associated 
antenna,  are  to  be  employed  to  achieve  a  diversity  gain.  A 
nonselective  fading  channel  is  considered  where  the  fading  is 
assumed  to  be  slow  enough  so  that  it  can  be  assumed  constant 
over  several  bit  periods.  In  our  explicit  examples,  Rayleigh 
fading  is  assumed.  The  observations  at  each  receiver  are  as¬ 
sumed  to  include  additive  zero-mean  Gaussian  noise. 

Each  of  the  receivers  will  generate  a  multiple  bit  decision 
and  a  single  final  decision  will  be  made  by  fusing  the  decisions 
from  the  individual  receivers.  Assume  that  synchronization 
between  the  individual  receiver  decisions  has  been  achieved, 
so  that  each  set  of  receiver  decisions  correspond  to  the  same 
transmitted  digit.  We  assume  that  an  accurate  estimate  of  the 
signal-to-noise  ratio  is  obtained  for  the  observations  available 
at  each  receiver.  Here  we  consider  the  case  where  these  esti¬ 
mates  can  be  assumed  to  be  correct,  as  a  first  approximation. 

We  have  outlined  the  optimum  design  of  such  a  system 
and  we  compared  the  performance  of  this  system  to  a  system 
which  uses  infinite  precision.  Our  results  indicate  that  using 
only  two  or  three  bits  in  the  individual  decisions  does  not 
sacrifice  much  performance,  while  this  can  simplify  receiver 
design  and  construction.  This  appears  to  be  an  important 
result  which  could  be  used  to  reduce  the  implementation  cost 
of  wireless  receivers.  Due  to  the  expansion  in  this  industry, 
we  believe  these  results  could  have  significant  impact. 
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Abstract  —  In  this  paper  we  consider  a  different  cod¬ 
ing  scheme  for  direct-sequence  spread-spectrum (DS- 
SS).  The  Nordstrom- Robinson(NR)  code,  a  nonlinear 
code  that  has  large  distance  for  a  given  rate,  used  in 
conjunction  with  a  trellis-code  [2]  version  is  examined. 
A  bound  is  developed  on  the  error  probability  for  this 
trellis  coded  Nordstrom- Robinson  (TCNR)  code  with 
noncoherent  reception  over  a  frequency-nonselective 
Rayleigh  or  Rician  fading  channel  with  additive  white 
Gaussian  noise.  This  bound  is  tighter  than  a  stan¬ 
dard  union  bound.  Our  results  indicate  that  the  stan¬ 
dard  union  bound  can  be  significantly  different  from 
the  more  accurate  results  obtained  from  the  improved 
union  bound. 

I.  Introduction  and  System  Model 

In  a  conventional  DS-SS  communication  system  a  single  data 
bit  is  transmitted  using  a  pseudo-random  sequence  or  its  nega¬ 
tive  and  binary  phase  shift  keying.  The  number  of  information 
bits  per  channel  chip  is  a  measure  of  the  rate  of  the  system 
when  it  is  used  in  an  environment  with  multiple-access  inter¬ 
ference  or  multipath  fading,  which  limits  the  maximum  data 
rate  capability.  An  error-correcting  code  such  as  convolutional 
code  or  block  code  can  be  used  to  provide  additional  protec¬ 
tion,  usually  at  the  expense  of  data  rate.  It  is  also  important 
to  consider  the  number  of  nearest  neighbors  codewords,  which 
affect  error  probability.  A  method  to  reduce  the  number  of 
nearest  neighbors  without  sacrificing  data  rate  is  to  use  a  com¬ 
bination  of  an  orthogonal  code  with  a  trellis  at  the  expense  of 
complexity.  In  this  paper  we  wish  to  explore  a  coding  scheme 
to  achieve  higher  data  rate  and  lower  error  probability.  This 
coding  scheme  was  first  introduced  in  [1]  and  analyzed  for 
coherent  reception  with  multiple-access  interference.  A  non¬ 
linear  Nordstrom- Robinson  (NR)  code  can  also  be  modified 
and  used  with  noncoherent  detection.  This  code  has  good 
distance  and  rate  performance,  and  can  be  efficiently  decoded 
with  a  soft  decision  algorithm. 

If  an  orthogonal  code  has  16  codewords  of  length  16  with 
minimum  distance  8,  the  data  rate  is  4/16  (4  information  bits 
over  16  channel  chips).  Starting  with  this  code,  we  can,  by 
adding  selected  orthogonal  cosets  to  the  original  code,  increase 
the  number  of  codewords  up  to  128  with  the  minimum  dis¬ 
tance  slightly  decreasing  to  6.  By  doing  so  we  get  the  nonlin¬ 
ear  Nordstrom- Robinson  code,  which  is  composed  of  8  cosets, 
each  of  16  orthogonal  codewords.  The  NR  code  has  the  geo¬ 
metric  uniformity  property;  i.e.,  its  distance  distribution  and 
its  weight  distribution  are  identical.  This  property  greatly 
simplifies  the  analysis  and  simulation  because  the  conditional 


error  probability  does  not  depend  on  which  particular  code¬ 
word  is  transmitted.  When  combined  with  a  4-state  trellis, 
this  trellis  coded  NR(TCNR)  code,  an  example  of  finite-state 
codes,  can  transmit  6  information  bits  in  every  16  channel 
chips.  Thus  we  have  decreased  the  minimum  distance  by  25% 
while  having  increased  the  rate  by  50%  to  6/16. 

In  this  paper  we  examine  the  performance  of  this  4-state 
TCNR  code  over  a  frequency-nonselective  Rayleigh  or  Rician 
fading  channel  with  additive  white  Gaussian  noise.  Nonco¬ 
herent  reception  is  assumed,  and  the  codewords  are  assumed 
to  be  interleaved  at  every  16  chips.  An  upper  and  a  lower 
bound  on  the  error  probability  have  been  derived.  Also,  the 
error  performance  of  the  TCNR  code  and  a  conventional  DS- 
SS  code  with  the  same  data  rate,  6/16,  is  compared. 

II.  Numerical  Results  and  Conclusions 

The  upper  bound  we  derive  for  the  TCNR  code  is  tighter  than 
the  standard  union  bound.  This  is  because  in  the  TCNR  code 
the  minimum  distance  error  events  are  from  codewords  within 
the  orthogonal  coset,  whose  error  probability  can  be  calculated 
exactly  when  the  channel  is  assumed  nonselective  Rayleigh  or 
Rician  fading  and  thus  the  orthogonality  within  each  coset  is 
preserved.  By  taking  this  minimum  distance  error  and  then 
upper  bounding  all  the  remaining  error  events,  we  get  an  im¬ 
proved  union  bound.  Numerical  results  imply  that,  at  high 
signal-to-noise  ratio  (SNR),  this  upper  bound  tends  to  merge 
with  the  error  probability  from  minimum  distance  codewords 
only,  which  is  our  lower  bound.  It  is  also  shown  that  at  high 
SNR,  the  TCNR  code  has  better  error  performance  than  the 
conventional  DS-SS  code  with  the  same  data  rate.  For  exam¬ 
ple,  our  results  indicate  that,  compared  with  the  conventional 
DS-SS  code  with  the  same  data  rate,  there  is  approximately 
a  4-dB  gain  in  Eb/N0  at  high  SNR  for  TCNR  code  over  a 
Rayleigh  fading  channel. 
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Abstract  —  I-Q  TCM  is  a  form  of  coded  modula¬ 
tion  in  which  two  independent  encoders  select  the  in- 
phase  and  quadrature  components  of  the  transmitted 
signal.  This  design  approach  results  in  a  significant 
increase  in  minimum  time  diversity  when  compared 
with  comparable  “traditional”  TCM  schemes.  I-Q 
TCM  schemes  of  varying  complexity  are  presented; 
it  is  shown  that  the  coding  gains  of  moderately  com¬ 
plex  systems  are  very  close  to  what  is  expected  from 
the  cutoff  rate  limit. 

I.  Introduction 

The  design  of  trellis-coded  modulation  (TCM)  schemes  for 
mitigating  the  effects  of  Rayleigh-distributed  flat  fading  has 
received  considerable  attention.  It  has  been  pointed  out  that 
the  effective  time  diversity  of  the  code  (i.e.,  its  symbol-wise 
minimum  Hamming  distance)  is  the  main  design  criterion  to 
optimize  trellis  codes  for  such  channels  [1].  TCM  schemes  op¬ 
timized  for  the  Rayleigh  fading  channel  were  presented  in  [2] 
and  [3].  Most  of  these  coding  schemes  use  the  “traditional” 
Ungerboeck  approach  -  i.e.,  they  involve  doubling  the  con¬ 
stellation  size  over  what  is  required  for  uncoded  transmission 
and  the  use  of  a  rate  k/(k  -f- 1)  encoder  to  describe  valid  sym¬ 
bol  sequences.  However,  if  a  rate  k/(k  +  1)  code  is  used,  the 
achievable  minimum  time  diversity,  L,  is  upper  bounded  by 
L  <  \u/k\  + 1,  where  v  is  the  number  of  memory  elements  in 
the  encoder.  Therefore,  most  of  the  results  obtained  are  far 
short  of  the  cutoff  rate  (Ro)  limit.  To  achieve  a  higher  degree 
of  minimum  time  diversity,  we  propose  the  use  of  I-Q  TCM. 

The  basic  idea  of  I-Q  TCM  is  to  use  two  independent  en¬ 
coders  in  parallel  to  select  the  in-phase  and  quadrature  com¬ 
ponents  of  the  transmitted  sequence;  this  approach  was  used 
by  Ho  el  al  to  demonstrate  the  feasibility  of  dense  constel¬ 
lations  for  fading  channels.  Using  this  approach,  L  is  upper 
bounded  by  L  <  [2i//k\  +  1.  Furthermore,  no  additional  de¬ 
coding  complexity  is  required;  complexity  here  is  measured  by 
the  number  of  paths,  excluding  parallel  ones,  emanating  from 
a  given  state  times  the  number  of  states,  per  information  bit. 
Although  the  proposed  codes  have  two  parallel  encoders,  they 
have  the  same  complexity  as  a  code  with  a  single  encoder 
with  the  same  number  of  states  because  the  number  of  bits 
entering  each  encoder  is  reduced  and  independent  decoding  is 
performed  on  the  two  decoders. 

II.  Designed  Codes 

Codes  with  bandwidth  efficiencies  of  1,2,  and  3  bits/s/Hz  and 
different  constraint  lengths  were  designed.  If  the  bandwidth 
efficiency  is  not  an  even  number,  then  the  encoder  operates 
every  two  signaling  intervals,  producing  4-dimensional  coded 
signals. 

1This  work  was  supported  in  part  by  National  Science  Founda¬ 
tion  grant  NCR-8957623;  also  by  the  NSF  Engineering  Research 
Centers  Program,  CDR-8803012. 


I-Q  TCM  schemes  with  a  bandwidth  efficiency  of  1  b/s/Hz 
were  designed  using  QPSK  modulation;  coding  gains  of  2- 
3  dB  are  achieved  with  respect  to  the  traditionally  designed 
schemes  of  equal  complexity.  Moreover,  for  the  I-Q  QPSK 
64-state  code,  a  BER  of  10-5  can  be  achieved  at  Et,/N0  «  7.5 
dB.  This  is  only  2  dB  from  the  cutoff  rate  limit. 

Codes  with  a  bandwidth  efficiency  of  2  b/s/Hz  based  on  16- 
QAM  were  also  designed.  They  provided  coding  gains  of  ~  4.5 
dB  over  8-PSK  schemes  [2].  Fig.  1  shows  the  simulated  BER 
of  the  proposed  codes  and  the  codes  from  [2]  for  v  =  3,4,6. 
Note  that  a  BER  of  10“5  can  be  achieved  at  Eb/N0  «  10.5  dB 
using  the  64-state  code.  This  is  only  2.5  dB  from  the  cutoff 
rate  limit  for  16-QAM  signaling. 

In  this  talk,  both  simulation  and  analytical  results  regard- 
ing  the  BER  performance  of  the  proposed  codes  will  be  pre¬ 
sented.  In  addition,  the  effect  of  a  non-uniform  signal  constel¬ 
lation  and  space  diversity  reception  will  be  considered. 


Figure  1:  Comparison  of  proposed  codes  and  codes  from 
[2]  for  a  bandwidth  efficiency  of  2  bits/s/Hz.  The  solid 
fines  represent  the  proposed  codes;  the  dashed  lines  rep¬ 
resent  codes  from  [2]. 
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Abstract  -  In  personal  communications  systems, 
users  contend  for  the  resources  of  frequency  and  time, 
with  re-use  determined  by  spatial  separation,  power  allo¬ 
cation,  antenna  beam  patterns,  data  rate,  and  the 
required  signal  to  interference  ratio  for  reliable  opera¬ 
tion.  We  describe  a  highly  adaptive  system  with  distrib¬ 
uted  control,  i.e.,  each  link  is  optimized  independently, 
with  coupling  only  via  the  mutual  interference. 

I.  SUMMARY 

A  high-performance  experimental  radio  transceiver  is  under 
development  at  UCLA.  It  will  include  frequency  hopping, 
variable  bit  and  power  allocation,  channel  coding,  adaptive 
equalization,  coherent  M-QAM  signaling,  rapid  channel 
probing,  and  adaptive  transmitter  and  receiver  antenna 
arrays.  Most  control  functions  will  be  distributed,  requiring 
at  most  feedback  between  a  transmitter  and  its  intended 
receiver.  Each  link  independently  attempts  to  achieve  the 
highest  possible  signal  to  interference  ratio  (SIR).  The  chal¬ 
lenge  is  to  design  a  set  of  adaptive  algorithms  that  will  inter¬ 
act  in  a  stable  fashion,  while  increasing  the  robustness  and 
throughput  of  the  network.  We  outline  the  major  adaptive 
subsystems  below. 

In  a  radio  network,  frequency  and  time  slots  (channels)  may 
be  re-used  at  some  distance  due  to  propagation  losses.  In 
dynamic  power  and  channel  allocation  (DPCA)  algorithms, 
channels  and  transmitter  powers  are  assigned  to  users  so  that 
all  members  of  the  network  meet  their  own  SIR  requirement 
for  reliable  communication.  A  distributed  DCPA  algorithm 
has  been  developed,  with  the  property  that  active  users  are 
protected  from  being  dropped,  at  the  cost  of  slightly  reduced 
throughput  relative  to  centralized  control.  In  essence,  new 
users  may  increase  their  power  less  aggressively  than  active 
users,  and  drop  out  when  making  little  progress  in  their  SIR. 
Convergence  can  be  improved  by  probing  the  channel.  The 
combination  of  the  present  SIR  at  a  given  power  level,  and 
the  “resistance”  of  the  system  in  the  form  of  increased  inter¬ 
ference  to  each  power  increment  are  used  in  making  a  pre¬ 
diction  of  the  final  SIR. 

In  a  frequency  hopped  system,  we  access  many  channels, 
and  probe  to  predict  the  final  SIR  in  each.  We  may  then 
assign  bits  and  power  to  maximize  the  throughput  for  this 
expected  SIR  distribution.  Channel  coding  with  interleaving 


across  the  frequency  slots  serves  to  realize  the  frequency 
diversity,  provides  coding  gain,  and  some  smoothing  of 
small  SIR  estimation  errors.  We  may  arrange  the  frequency 
hops  to  be  synchronous  among  cells,  so  that  the  same  set  of 
interferes  is  encountered  on  each  hop.  DPCA  then  reduces 
to  the  single  channel  form.  A  second  option  is  to  randomize 
the  hopping  patterns  among  the  different  cells.  A  combina¬ 
tion  of  bit  allocation  and  coding  then  produces  a  hybrid  mix 
of  interference  averaging  and  interference  avoidance,  since 
we  may  choose  not  to  allocate  any  power  to  those  hops  with 
large  resistance.  Simulations  for  the  simpler  case  of  choosing 
M  out  of  N  hops  with  equal  bit  allocation  reveals  that  net¬ 
work  throughput  is  very  similar  to  the  first  option.  However, 
this  procedure  is  more  robust  with  respect  to  channel  varia¬ 
tions  since  the  set  of  channels  occupied  can  be  slowly 
changed,  with  the  effect  on  any  other  being  small  since  there 
is  mutual  interference  in  only  one  hop.  We  are  also  investi¬ 
gating  hybrid  fixed  assignment/DCPA  schemes,  which  alle¬ 
viate  certain  difficulties  that  arise  in  admission  and  handoff. 

Antenna  arrays  may  also  be  used  to  suppress  multipath  and 
reduce  interference.  We  propose  to  adapt  both  transmitter 
and  receiver  arrays  using  least  squares  techniques.  Switching 
between  sets  of  fixed  beam  patterns  is  not  feasible  for  indoor 
systems,  since  we  must  gain  some  compromise  benefit 
between  diversity  combining  and  interference  cancellation, 
and  the  multipath  has  a  very  wide  angular  spread.  Addition¬ 
ally,  the  human  body  interacts  with  the  terminal  to  change 
the  beam  pattern.  Another  interaction  is  that  between  differ¬ 
ent  pairs  of  communicating  users.  As  the  transmitter  pattern 
of  one  array  changes,  so  do  the  receiver  patterns  for  all  users 
in  the  vicinity.  This  in  turn  affects  their  transmitter  patterns, 
as  the  latter  may  only  be  adapted  based  on  the  received  sig¬ 
nals.  The  antenna  patterns  must  also  react  to  changes  in  the 
power  levels  and/or  channel  assignments  of  the  other  users 
in  the  network.  Thus,  for  indoor  applications  the  interaction 
of  these  adaptive  loops  may  be  the  dominant  factor  in  the 
channel  dynamics,  rather  than  motion  of  the  radios.  We  have 
investigated  the  dynamics  of  an  adaptive  antenna  array  with 
a  variety  of  equalizer  and  transmitter  adaptation  options, 
with  the  conclusion  that  the  ordinary  LMS  algorithm  should 
be  adequate.  The  imposition  of  orthogonality  among  chan¬ 
nels  within  a  ceil  together  with  the  minimum  SIR  require¬ 
ment  for  links  to  be  declared  feasible  serve  to  decouple  the 
users.  The  antenna  arrays  should  be  adapted  on  a  time  scale 
faster  than  power  control,  since  the  antenna  gain  affects  the 
perceived  path  gains  between  users. 


1.  Supported  by  ARPA  contract  JFBI94-222/J4C942220 
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Abstract  -  The  advisability  of  using  adaptive  strategies  in 
channels  with  side  information  at  the  transmitter  is  considered. 
Different  adaptive  strategies  are  defined  for  block  codes  (BC) 
and  punctured  convolutional  codes  (PCC)  and  compared  on 
throughput  and  bit  error  probability  after  decoding. 

I.  Introduction. 

Shannon  [1]  studied  some  communication  systems  with  side 
information  available  to  the  transmitter,  proving  the  positive  effect 
of  the  side  information  on  the  achievable  capacity.  Nevertheless,  it 
is  not  obvious  how  to  take  advantage  of  the  side  information  in 
practice.  The  authors  have  been  studying  a  system  in  which  this  side 
information  is  a  true  indication  of  the  channel  state  and 
consequently,  the  transmitter  adapts  the  parameters  of  an  error 
control  scheme  in  order  to  obtain  the  desired  error  rate,  whilst 
keeping  throughput,  complexity  and  delay  at  acceptable  levels. 

Many  different  schemes  can  be  proposed.  A  gross  distinction 
between  competing  schemes  is  based  on  whether  or  not  the  side 
information  is  embedded  in  the  transmission.  In  a  previous  paper 
[2]  both  possibilities  were  analysed  and  the  scheme  that  did  not 
transmit  the  side  information  was  identified  as  superior. 

It  is  the  aim  of  this  abstract  to  present  some  improvements  on  the 
previously  presented  adaptive  schemes  (all  of  them  based  on  BC), 
and  also  to  introduce  some  new  approaches  using  PCC. 

II.  Model  Description. 

The  transmitter  strategy  is  simply  to  use  different  codes  for  different 
channel  states.  Because  the  block  lengths  are  not  the  same  for  all 
codes  a  special  metric  is  adopted  at  the  receiver  [3].  This  metric 
depends  on  the  joint  probability  of  the  message  m  and  the  received 
sequence  y  P(m,y);  we  can  write: 

P(m,y)  =  P(m)P(xJm)P(y|xm) 

The  main  problem  observed  in  the  previous  work  was  the 
possibility  of  losing  synchronisation.  Lack  of  synchronisation  is 
detected  when  /  blocks  are  found  in  error  in  m  consecutive  blocks. 
However,  in  the  previous  approach  the  data  was  sent  directly  to  the 
data  sink,  whereas  now  it  is  kept  until  m  blocks  are  decoded.  At  this 
point,  provided  that  l  blocks  are  not  in  error,  the  first  data  block  is 
sent  to  the  data  sink.  Otherwise  re-synchronisation  is  achieved  by 
moving  a  decision  window  bit  by  bit.  This  technique  reduces  the 
decoded  error  rate  although  it  increases  the  delay  and  the 
complexity.  Another  possible  scheme  is  developed  using  a  tree 
structure.  Here,  a  stack-like  algorithm  is  implemented  using  the 
previously  defined  metric.  The  advantage  of  this  scheme  is  that  it 
achieves  synchronisation  automatically,  provided  the  buffer  does 
not  overflow.  The  disadvantage  is  that  a  poor  metric  can  easily  lead 
to  buffer  overflow. 

Finally  a  completely  different  scheme  is  proposed,  this  time  using 
PCC.  The  main  advantage  of  this  scheme  is  that  rate  changes  can  be 
more  gradual  since  they  only  involve  a  change  in  the  puncturing 
matrix.  Two  techniques  of  this  type  are  examined  depending  on 
whether  the  convolutional  encoder  is  flushed  after  each  block. 
Clearly,  when  the  encoder  is  flushed  the  error  performance  is  better 
and  the  complexity  is  lower  but  the  throughput  is  less  good. 
Without  flushing  the  throughput  increases  but  the  decision 
technique  is  more  troublesome  and  incorrect  decoding  can  result 
more  easily. 


Further  analysis  allows  the  complexity  of  each  scheme  to  be 
calculated  and  compared.  However,  since  the  main  source  of 
complexity  resides  in  the  decoder  and  this  depends  on  the  algorithm 
used,  a  completely  fair  comparison  is  very  difficult.  Nonetheless, 
the  scheme  with  constant  block  length  promises  the  least 
complexity,  followed,  quite  closely,  by  the  flushed  PCC  scheme.  On 
the  other  hand,  the  schemes  using  either  a  constant  number  of 
information  bits  or  a  non-flushed  PCC  are  quite  complex  due  to  the 
occasional  necessity  for  data  re-decoding  or  backtracking 
respectively. 

III.  Results  and  Conclusion. 

The  schemes  presented  are  designed  to  achieve  an  error  rate  under 
10 5  while  working  between  1  and  12  dB  (Es/N0).  The  codes  used 
are  tabulated  here: 

BC: 

Constant  block  length: 

Scheme  1.-  («,^,0=(63,1 1,12),  (63,24,7),  (63,30,6)  and  (63,51,2). 

Constant  number  of  information  bits: 

(n, k,t)=(63, 1 1 , 1 2),  (3 1 , 1 1 ,5),  (23, 1 1 ,3)  and  ( 1 5, 1 1 , 1 ) 

Scheme2.-  Buffer  technique.  Scheme3.-  Tree  technique. 

PCC:  (Buffer  length  96) 

Original  code  (nX A)=(4, 1,5)  and  punctured  to  rate  1/3,  1/2  and  7/8. 
Scheme4.-  Flushing.  Scheme5.-  Without  flushing. 
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Abstract  -  A  novel  maximum-likelihood  hard-  and  8- 
level  soft-decision  scarce-state-transition  (SST)  type 
syndrome-former  error-trellis  decoding  system  of 
(n,n- 1)  convolutional  codes  with  coherent  BPSK 
signals  for  additive  white  Gaussian  noise  channels  is 
proposed.  The  proposed  system  retains  the  same 
number  of  binary  comparisons  as  the  syndrome-former 
trellis  decoding  method  of  Yamada  et  al.  [2],  Like  the 
original  SST-type  register-exchange  Viterbi  decoding 
system  [4],  the  proposed  system  also  has  the  same 
advantage  of  drawing  less  power  when  implemented  on 
CMOS  LSI  chips.  A  combination  of  the  two 
techniques  results  a  less  complex  and  low  power 
consumption  decoding  system. 

SUMMARY 

In  Viterbi  algorithm  decoding  of  (n,  k)  convolutional 
codes,  the  decoder  carries  out  (2^-l)-ary  comparisons  at 
each  node  of  the  encoder  trellis  [1],  The  implementation 
of  the  Viterbi  decoder  becomes  impractical  for  high-rate, 
powerful  codes  as  the  number  of  operations  and  memory 
path  histories  increase.  In  a  1983  paper,  Yamada  et  al  [2] 
proposed  a  maximum-likelihood  decoding  system  for  rate- 
(n-l)fn  convolutional  codes,  and  the  system  performance 
was  studied  by  Lee  and  Farrell  [3].  The  decoding  system 
applies  the  Viterbi  algorithm  to  the  syndrome-former 
trellis  of  the  code.  Apparently,  the  number  of  trellis  states 
is  doubled,  but  the  number  of  comparisons  at  each  node  is 
reduced  to  a  binary  comparison.  Recently,  Kubota  et  al 
[4]  proposed  scarce-state-transition  (SST)  register- 
exchange  (information  bits  are  associated  with  surviving 
paths)  Viterbi  decoding  system  of  reduced  states, 
implemented  on  CMOS  VLSI  chips  and  consumed  less 
power  in  the  low  bit-error-rate  (BER)  operating  region 
when  compared  with  a  hypothetical  register-exchange  type 
of  Viterbi  decoder.  A  power  consumption  reduction  of 
40%  at  a  bit  error  rate  of  0.0001  can  be  achieved  when 
operating  at  an  information  rate  of  25  Mbit/s  [4],  and  the 
measured  power  consumption  with  increasing  channel 
noise  was  also  reported  in  [4].  In  this  paper,  we  proposed 
a  new  maximum-likelihood  SST-type  trellis  decoding 
system  for  rate-(/?- \)hi  convolutional  codes,  called  the 
SST-type  syndrome-former  error-trellis  decoding  system. 
Our  decoding  system  differs  from  the  error-trellis  syndrome 
decoding  technique  proposed  by  Reed  et  al  [5].  In  their 
paper,  the  trellis  is  constrained  and  drawn  from  a  &-input, 
(«-/:)- output  regulator  circuit  of  a  rate-/://!  convolutional 
code  and  is  only  applicable  to  the  class  of  systematic  codes 
whereas  our  syndrome-former  error- trellis  is  drawn  from  the 
n-input,  single-output  syndrome-former  circuit  of  a  rate- 
(n-\)ln  systematic  or  non-systematic  convolutional  code. 
The  new  system  is  similar  to  the  SST-type  Viterbi 
decoding  system  [4]  in  that  it  has  the  advantage  of 
drawing  less  power  when  implemented  on  CMOS  chips 
and  operated  in  a  low  BER  condition.  Like  the  Yamada 


decoding  system  [2],  the  new  system  has  also  retained  a 
binary  comparison  at  each  trellis  node  and  significantly 
reduces  the  decoding  complexity.  A  combination  of  the 
two  techniques  results  a  less  complex  and  low  power 
consumption  decoding  system. 

The  simulated  bit  error  probability  performance  of  the 
proposed  hard-  and  8-level  soft-decision  decoding  system, 
shown  in  Figure  1,  for  additive  white  Gaussian  channels  is 
presented.  Furthermore,  the  implementation  complexity  of 
the  new  decoding  system  is  compared  with  the  SST-type 
register-exchange  Viterbi  decoding  system. 


HD) 


convolutional 
encoder 
OP  ) 


W!m 

discrete 

noisy 

channel 

* 

X(D  ) 


inverter 

g\d) 


3 -bit  natural 
binary 
quantiser 


delay 

inverter 
G~l(D  ) 


re-encoder 
OP  ) 


3-bit  natural 
binary  quantiser 


Fig.  1  Model  of  an  eight-level  soft-decision  SST-type 
syndrome-former  error- trellis  decoding  system 
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Abstract  -  Sequential  decoding  of  trellis  codes  at  high  spectral 
efficiencies  is  investigated  and  large  constraint  length  trellis 
codes  for  two  dimensional  and  four  dimensional  constellations 
are  constructed  for  use  with  sequential  decoding.  It  is  shown 
that  the  channel  cut-off  rate  bound  can  be  achieved  using 
constraint  lengths  between  16  and  19  with  sequential  decoding 
at  a  bit  error  rate  of  lO*5  -  10'6. 

I.  INTRODUCTION 

Recently,  it  has  been  shown  that  sequential  decoding  is  a  good 
alternative  to  Viterbi  decoding  for  trellis  codes  and  significant 
coding  gains  can  be  achieved  using  sequential  decoding  with 
large  constraint  length  trellis  codes  compared  to  Viterbi 
decoding  with  small  constraint  Iengths[l]-[3].  The  channel  cut¬ 
off  rate  Ro  is  the  maximum  rate  at  which  the  average  number 
of  computations  for  sequential  decoding  is  bounded.  Tlius,  Ro 
is  regarded  as  the  maximum  rate  for  which  reliable 
communication  can  be  achieved  with  reasonable  complexity. 
Trellis  codes  for  8-PSK  and  16-QAM  constellations  with  large 
constraint  lengths  were  constructed  for  use  with  sequential 
decoding  in  [3,4].  For  these  constellations,  it  was  shown  that 
the  channel  cut-off  rate  bound  can  be  achieved  using  large 
constraint  length  codes  with  sequential  decoding  at  a  bit  error 
rate  (BER)  of  10‘5  -  10‘6  on  Additive  White  Gaussian  Noise 
(AWGN)  channels[3].  In  this  paper,  we  discuss  the 
construction  of  trellis  codes  at  higher  spectral  efficiencies  for 
use  with  sequential  decoding. 

II.  SEQUENTIAL  DECODING  AND  THE  FANG  METRIC 

The  calculation  of  the  Fano  metric  at  high  spectral  efficiencies 
and  for  multidimensional  signal  constellations  is  discussed. 
We  show  that  the  computation  of  the  Fano  metric  for 
multidimensional  signals  can  be  decomposed  into  a  simpler 
calculation  for  the  constituent  two  dimensional  signals,  and 
thus  that  the  computational  complexity  of  decoding  a 
multidimensional  trellis  code  using  sequential  decoding  is 
comparable  to  decoding  a  two  dimensional  trellis  code. 
Simulation  results  show  that  the  computational  distribution  for 
sequential  decoding  of  multidimensional  trellis  codes  at  high 
spectral  efficiencies  can  be  very  well  approximated  by  a 
Pareto  distribution.  This  implies  that  the  code  construction 
criteria  for  trellis  codes  with  sequential  decoding  derived  for 
small  spectral  efficiencies  can  also  be  applied  to  the 
construction  of  trellis  codes  at  high  spectral  efficiencies. 

III.  CODECONSTRUCTION 

The  Random  Search  (RS)  algorithm  proposed  in  [3]  is 
investigated  and  modified  to  construct  trellis  codes  at  high 
spectral  efficiencies.  This  work  was  motivated  by  the  random 
coding  principle  that  an  arbitrary  selection  of  code  symbols  will 
produce  a  good  code  with  high  probability.  In  the  code 
construction  algorithm,  the  sequential  decoding  performance 
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was  used  as  the  criterion  for  selecting  good  codes.  Tlius  the 
algorithm  works  well  as  long  as  the  performance  of  the  code 
can  be  evaluated  using  sequential  decoding. 

hi  practice,  rotational  invariance  is  a  desirable  property.  It 
allows  the  decoder  to  synchronize  quickly  at  startup  or  after  a 
phase  slip,  hi  [5],  a  simple  method  was  proposed  to  check  the 
rotational  invariance  of  a  given  code.  It  was  shown  that 
rotationally  invariant  trellis  codes  with  large  constraint  lengths 
can  be  found  in  a  systematic  way.  hi  the  modified  RS  (MRS) 
algorithm,  this  method  is  used  to  insure  that  rotationally 
invariant  trellis  codes  are  found. 

IV.  RESULTS  AND  DISCUSSIONS 

The  MRS  algorithm  was  used  to  construct  two  dimensional  and 
four  dimensional  trellis  codes.  180°  rotationally  invariant  linear 
trellis  codes  for  two  dimensional  constellations  with  constraint 
lengths  16-19  were  obtained.  Simulation  results  show  that  the 
cut-off  rate  bound  can  be  achieved  using  sequential  decoding 
with  a  constraint  length  16  code  at  a  BER  of  10'5  and  with  a 
constraint  length  19  code  at  a  BER  of  10'6.  Similarly,  180° 
rotationally  invariant  linear  trellis  codes  and  90°  rotationally 
invariant  nonlinear  trellis  codes  for  four  dimensional 
constellations  were  found  using  the  MRS  algorithm.  The 
partitioning  and  labeling  of  the  four  dimensional  constellations 
are  the  same  as  Wei’s  [6],  It  was  also  shown  that  the  channel 
cut-off  rate  bound  can  be  achieved  using  sequential  decoding 
with  four  dimensional  codes  using  constraint  lengths  between 
16  and  19  at  BER’s  of  1()'5  -  10*°. 
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An  efficient  fault-detecting  methodology, 
algorithm-based  fault  tolerance,  may  be  extended  to 
include  error  correction  of  the  output  data  in  a 
protected  linear  processing  system  by  coupling  a 
high-rate  real  convolutional  code  with  a  smoothed 
Kalman  recursive  estimation  technique  [1].  A 
completely  protected  fault-tolerant  linear  processing 
system  involving  error  correction  is  shown  in 
Figure  1  where  it  is  guaranteed  that  no  miscorrected 
data  leave  the  configuration  if  at  most  one  box- 
surrounded  subsystem  fails  at  a  time..  The  real 
convolutional  code  dictates  the  comparable  parity 
streams  computed  in  two  ways,  forming  the 
syndrome  stream  that  is  passed  to  the  Fixed-Lag 
Corrector  when  values  exceed  threshold  settings. 
The  block  processing  and  down  sampling  features 
of  the  convolutional  code  permit  the  overhead  area 
to  be  from  20-50%  of  the  main  processing  area. 


The  reliability  function  of  the  protected  system  is 
calculated  when  failures  are  assumed  to  arrive 
according  to  a  Poisson  process  with  uniform  rate 
per  unit  area.  Arrivals  in  the  main  processing  part 
are  assumed  independent  of  those  in  the  protection 
overhead  parts  leading  to  respective  arrival  rates  a 
and  b  as  shown  in  Figure  1.  The  reliability  levels 
are  computed  using  iterated  integrals  over 
appropriate  regions  and  conditional  probability 
expansions.  The  guard  space  of  the  convolutional 
code  is  described  by  parameter  c.  Reasonable  lower 
bounds  on  the  reliability  levels  which  depend  only 
on  the  arrival  rates  a  and  b  and  guard  parameter  c 
are  established  by  bounding  individual  conditional 
events. 

R(t)>e'(a  +  b)t{  l  +  bt  + 


n-1 


t-nc 

i 

*1=0 


t-(n-l)c  t-(n-i+l)c 

I  -  I 


^2=^l+c  ^i=^i-l+c 


J 

^n-l+c 


[l  +  b^J-  |n[l  +  b(^  -5j_!  -c)]j 

[i  +  b(t-?.-c)]  -di;,  } 

These  reliability  bounds  are  easily  calculated 
employing  a  standard  computer  algebra  package  on 
a  workstation.  Typical  results  are  shown  in 
Figure  2  for  a  real  code  based  on  a  binary  rate  5/6 
Berlekamp-Preparata-Massey  burst  correcting  code. 
The  failure  intensities  (FITS)  of  rates  a  and  b  are 
indicated  and  the  guard  space  parameter  c  is  related 
to  a  30ns  clocking  period.  The  results  for  a  coded 
versus  uncoded  system  are  displayed  separately 
because  of  the  large  differences  in  scales  of  the 


reliability  logarithms.  There  is  a  dramatic 
improvement  in  levels  due  to  correction  even  when 
the  additional  area  of  the  protection  overhead  is 
included. 
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t  hours 


Run(t)  =  Reliability  of  Uncoded  Processing  System  Only 


t  hours 


PARAMETERS : 

a  =  2  10'6  =  2000  fits  ;  Processing  Section  Failure  Rate 
b  =  400, 600, 800, 1000  fits  ;  Corrector  Section  Failure  Rate 
T  =  30ns.  ;  Processing  Sample  Period 

Figure  2  :  Example  Comparisons  of  Reliabilities  for  Coded  and  Uncoded  Systems 
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Abstract  —  A  bidirectional  Viterbi  decoding  algorithm  for 


framed  inform ati on  with  repeat  request  which  is  an  extended 
version  of  the  Yamamoto-Itoh  scheme  is  presented  A 
method  to  estimate  the  unreliable  region  in  a  received  frame 
using  the  proposed  algorithm  is  also  presented. 

I  .  BIDIRECTIONAL  VITERBI  DECODING 
ALGORITHM  WITH  REPEAT  REQUEST 

Yamamoto  and  Itoh  proposed  a  convolutionally  coded 
ARQ  scheme  with  Viterbi  decoding  in  order  to  improve  the 
reliability  of  convolutional  coding/Viterbi  decoding  [1].  In 
this  scheme,  all  survivors  are  labeled  either  GOOD  or  BAD 
and  retransmission  is  requested  if  all  survivors  are  labeled 
BAD.  On  the  other  hand,  it  is  known  that  a  string  of  received 
symbols  corresponding  to  a  frame  of  information  bits 
augmented  by  tail  bits  can  be  decoded  from  both  directions 
simultaneously  [2],  [3].  Taking  these  facts  into 

consi deration,  we  propose  a  bidrectional  Viterbi  decoding 
algorithm  for  framed  information  with  repeat  request. 

In  this  scheme,  a  received  frame  is  accepted  if  the  forward 
and  reverse  decoders  can  meet  without  declaring 
retransmission  in  the  course  of  decoding.  The  ML  path  is 
decoded  by  a  trace-back  method  or  some  equivalent  one,  alter 
determining  the  node  x0  on  the  ML  path  at  the  point  of 
junction.  The  proposed  scheme  is  most  efficiently  applied  for 
the  case  where  one  of  the  two  decoders  (  let  this  decoder  be 
the  forward  one  )  stops  at  some  node  level  t0  declaring 
retransmission,  while  the  other  decoder  (  i.e.,  the  reverse 
decoder  )  can  proceed  up  to  level  t0.  In  this  case,  only  a  part 
of  the  frame  (i.e.,  [0,  t0]  )  is  needed  to  retransmit.  For  the 
retransmitted  data,  the  two  decoders  resume  decoding  from 
both  directions  (  Note  :  the  reverse  decoder  can  continue 
decoding  operations  using  the  survivors  and  their  metrics 
computed  till  then  ).  If  the  same  situation  happens  after  the 
first  retransmission,  partial  retransmission  is  requested  again 
and  the  procedure  is  repeated  until  the  two  decoders  can  join. 
It  is  derived  analytically  that  the  averaged  quantity  of 
retransmission  per  frame  in  the  proposed  scheme  is 
approximated  by  (LV4)px  (  L’  :  length  of  a  coded  frame,  px  : 
probability  of  retransmission  per  frame  ). 


H.  ESTIMATION  OF  UNRELIABLE  REGION 
When  retransmission  is  requested  at  node  level  t0  in  the 
Yamamoto-Itoh  scheme,  we  know  that  some  noisy  region 
has  started  before  t0  in  the  correspond ng  trellis.  That  is,  the 
noisy  region  in  the  received  data  has  been  roughly  estimated 
in  "one"  d  recti  on.  Making  use  of  this  fact,  we  show  that  the 
unreliable  region  is  estimated  as  an  "interval"  by  using  a 
bidrectional  Viterbi  decodng  algorithm  with  repeat  request, 
especially  in  the  case  where  the  two  decoders,  one  with  a  flag 
of  retransmission  and  the  other  without  it,  can  join  in  the 
course  of  decodng.  In  such  a  case,  by  tracing  the  ML  path 
forw  ard  and  backward,  we  can  find  the  node  x  *  (level  tf )  at 
which  the  label  of  the  ML  path  turns  BAD  for  the  first  time 
and  the  node  x  **  (level  t^Q  at  which  the  second  best  path  for 
node  Xj*  has  d  verged  from  the  ML  path  in  forward  (i-1)  and 
reverse  (i=2)  decodng.  Then  the  interval  [tf’\  t2  |  is  regarded 
as  an  unreliable  region.  The  relation  between  and  U”  is 
given  by  the  following  lemma  : 

<  Lemma  >  t  V  and  t  ^  t2 . 

In  order  to  realize  the  above  idea,  we  incoiporate  a  new 
bidrectional  Viterbi  decodng  scheme  into  the  proposed 
algorithm.  In  this  scheme,  only  the  metrics  of  the  survivors 
are  remembered  until  the  two  decoders  join  and  alter  that  they 
serve  as  preliminary  computations  for  detennining  the  nodes 
on  the  ML  path.  It  is  shown  that  the  scheme  is  very 
convenient  for  tracing  die  ML  path  forward  or  backward. 
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Abstract  —  A  reduced  complexity  algebraic  type 
algorithm  is  described  for  decoding  of  convolutional 
codes  over  GF(q ),  q  >  2.  It  is  founded  on  the  same 
principles  as  algebraic-sequential  decoding  [1,  2],  It 
is  proved  that  for  large  q ,  the  algorithm  has  better 
complexity-reliability  tradeoff  than  the  conventional 
Viterbi  algorithm. 

SUMMARY 


Let  us  consider  a  discrete  symmetric  memoryless  channel 
(DSMC)  with  input  and  output  alphabets  A  —  {0, 1, . . . ,  q  — 
1},  where  q  >  2  is  a  prime  or  a  power  of  a  prime.  By  defini¬ 
tion  of  a  DSMC,  each  output  symbol  of  the  channel  depends 
only  on  the  corresponding  input.  The  conditional  probability 
pij  of  receiving  symbol  j,  j  G  A,  provided  that  the  symbol  i, 
i  G  A ,  has  been  transmitted,  is  given 


f  1  -  e.  iii  =  j 

\  e/(g  — 1),  otherwise 


Let  v  =  (v0 ,  Vj  , . . .)  =  (voi,  V02,  •  •  •  VOc,  Vll,  «>12,  .. .  ,l>lc,  ••  •) 
be  the  output  (code)  sequences  at  the  output  of  the 
convolutional  rate  R  =  bjc  memory  m  encoder,  u  = 
(u0,Ui,...)  =  (uoi,  «02,  •  * .  ,«06,«ii,  «i2,  ••  •  ••  •)  be  the 

input  (data)  sequence.  Then  v  =  uG,  where  G  is  a  semi¬ 
infinite  generator  matrix  having  b  x  c  submatrices  as  ele¬ 
ments.  All  elements  of  v,  u  and  G  are  elements  in  GF(q) 
and  all  operations  are  performed  over  GF(q).  Let  r  = 
(i*o,  ri, . . .)  =  (roi,r02s...,roc,riiJri2,...1ric,...)  be  the  re¬ 
ceived  sequence,  Ti3  G  GF(q). 

We  introduce  the  binary  error  locator  sequence  1  = 

(lo,  li , . . .)  =  (foi,  fo2,  *  *  •  Ao c,  fn  jfi2,  •  •  •  fic,  • .  •),  €  {c,  e}, 

where  ii3  =  c  (“correct”)  if  rij  is  received  correctly  and  iij  =  e 
(“erroneous”)  otherwise.  A  sequence  1  is  considered  as  sur¬ 
vived,  if  there  exists  a  code  sequence  v,  which  symbols  coin¬ 
cide  with  the  symbols  of  r  in  the  positions  where  1  have  symbol 
c  and  not  in  the  other  positions.  If  the  decoder  knows  the  er¬ 
ror  locator  sequence,  it  can  correctly  decode  the  information 
sequence,  if  it  can  do  maximum-likelihood  (Viterbi)  decoding. 

The  set  of  survived  error  locator  sequences  can  be  repre¬ 
sented  as  a  set  of  paths  in  a  binary  error  locator  tree.  The 
decoding  algorithm  can  then  be  treated  as  a  search  algorithm 
in  the  error  locator  tree.  We  consider  a  list-decoding  type 
algorithm:  In  every  decoding  step  the  decoder  selects  the  L 
most  likely  sequences  in  the  error  locator  tree  and  calculate 


its  survived  successors. 

To  characterize  the  algorithm  we  introduce  the  character¬ 
istic  parameter  z  =  (1  —  R)/\ogq(q  —  1)  and  the  effective 
decoding  distance  de/,  which  plays  the  same  role  as  the 
free  distance  does  for  the  Viterbi  algorithm:  The  decoder  cor¬ 


rects  all  combinations  of 


£. tZ 


or  less  errors. 


Theorem  1:  There  exists  a  rate  R  g-ary  time-invariant 
convolutional  code,  whose  effective  distance  resulting  from  al¬ 
gebraic  type  Viterbi  decoding  of  list  size  L  is  lowerbounded 


by  the  unequality 

,  ^  * log2  L 

def  >  ,  , \  ■  +  const , 

h2(z) 

where  h2(z)  =  —z\og2  z  —  (1  —  z)  log2(l  —  z)  and  the  constant 
does  not  depend  of  L. 

Comparison  with  the  Costello  bound  for  the  free  distance  of 
convolutional  codes,  shows  that  for  large  q  the  algebraic  type 
Viterbi  decoding  gives  essentially  better  complexity-reliability 
tradeoff  than  conventional  Viterbi  decoding. 

Using  modified  random  coding  technique  we  obtain  a  ran¬ 
dom  coding  bound  and  an  expurgation  bound  for  the  proba¬ 
bility  of  decoding  error  for  the  algorithm. 

To  formulate  the  expurgation  bound,  we  introduce  the  al¬ 
gebraic  computational  cutoff  rate: 

R^  —  max{l  -  z0  ■  log q(q  -  1),  R0}, 

where  zo  is  the  largest  root  of  the  equation 

zlog  +  (1  -  z)log  1 — L  =  0 

and  Ro  is  the  “conventional”  computational  cutoff  rate  of  the 
DSMC: 

Ro  =  1  -  21og,{(l  -  e)1'2  +  [e(g  -  l)]1/2}. 

Theorem  2  (expurgation  bound):  For  a  g-ary  DSMC, 
there  exist  a  rate  R  g-ary  time-invariant  convolutional  code 
and  a  A-list  algebraic  type  decoder,  whose  burst  error  proba¬ 
bility  is  upperbounded  by 

Pe  <  R  < 

where 

Z  log  y/e  -f  (1  —  z)  log  y/l  —  € 
h2(z) 

Theorem  3  (random  coding  bound):  For  a  g-ary  DSMC, 
there  exists  a  rate  R  g-ary  time-invariant  convolutional  code 
and  a  L-list  algebraic  type  decoder,  whose  burst  error  proba¬ 
bility  upperbounded  by 

pe  <  r^(R)+o(1),  R>R(0a), 


where 

*i°g2  f  +  (i  -  z)iog2  jrf 

h2(z) 
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Abstract  A  new  concept  of  A -order  orthogona- 

lization,  and  a  general  threshold  decoding  method  is 
introduced.  Many  codes  decodable  using  general 
threshold  decoding  can  be  constructed  and  are  superior  in 
performance  to  other  majority-logic  decodable  codes. 

I.  INTRODUCTION 

Many  efforts  have  been  made  [1]  [2]  in  trying  to  decode  more 
codes  using  majority-logic  decoding  with  nonorthogonal  parity- 
check  sums.  This  approach,  however,  requires  more  parity-check 
sums  for  each  error  digit  than  orthogonal  majority-logic 
decoding,  and  is  only  applicable  to  a  small  class  of  codes. 
Rudolph  and  Robinson  [3]  claimed  that  any  decoding  function  for 
a  linear  binary  code  can  be  realized  as  a  weighted  majority-logic 
decoding.  However,  each  weight  element  in  this  scheme  is  a 
function  of  all  the  2n~k  parity-check  sums,  so  it  will  involve  a 
large  number  of  computations  in  addition  to  the  majority-logic 
operation.  The  method  presented  in  this  paper  can  be  viewed  as 
an  alternative  to  applying  the  threshold  decoding  method  to 
more  types  of  codes,  but  involving  fewer  computations. 

H.  THE  GENERAL  THRESHOLD  DECODING  RULE 
Let  syndrome  digits  s0,  slt s  ^  be  Boolean  variables 
and  ^(Sq,  sl9  s  k^)  the  decoding  function  for  the  error 
digit  in  the  /th  position  of  the  received  vector.  The  general 
one-step  threshold  decoding  (simply  GTD)  rule  is  defined  to  be 


e,(sB,  5, 


1 


)  = 


0 


^  I Uj*T 

j=0  (1) 

if  <  7 
j= o 


where  Aj  =  a„s0  +  a,s,  +  ...  +  («,  6  GF( 2)),  A, 

is  a  parity-check  sum,  J  is  the  number  of  parity-check  sums  on 
et ,  and  T  is  the  threshold  value. 

Definition:  A  code  is  said  to  be  X  -order  orthogonal  if  for 
any  set  of  parity-check  sums,  e.g.,  a  set  on  e{,  el  appears  in  each 
parity-check  sum,  but  no  other  error  digits  appear  more  than  X 
(X  >1)  times  in  the  set. 

This  definition  covers  both  orthogonal  (X  =1)  and 
nonorthogonal  (X  >1)  cases.  In  the  table  below,  the  parameters 
J  and  tc  for  three  decoding  methods,  majority-logic  decoding  for 
orthogonalizable  codes  (simply  orth-M-L),  majority-logic 
decoding  for  nonorthogonalizable  codes  (non-orth-M-L)  [1]  and 
GTD  for  X  -order  orthogonalizable  codes  (X  -orth-threshold),  are 
listed  for  the  purpose  of  comparisons. 

It  is  easy  to  show  that  GTD  is  always  applicable  where 
majority-logic  decoding  can  be  used,  and  requires  fewer  parity- 
check  sums  than  the  second  case. 


orth-M-L  non-orth-M-L  X  -orth-threshold 


J 

>2  tcX 

>{2tc-\)X+\ 

t< 

n  —  1 

< 

n  —  1 

< 

A(n  +  d-l)-d 

c 

Wd- 1). 

|_2(<7-1)J 

2A(d-\) 

ffl.  THRESHOLD  DECODABLE  CODES 


It  can  be  shown  [4]  that  there  are  many  codes  which  can  be 
decoded  in  one  step  using  the  rule  given  by  (1). 

*  Any  (2m  - 1,  2m  —  m  —  1)  Hamming  code  is  an  (m  - 1)- 
order  orthogonal  code  which  can  be  decoded  by  one-step 
threshold  decoding  with  the  threshold  value  T  ~m.  Hence, 
only  one  threshold  gate  is  required  for  hardware  implementations 

m- 2 

in  this  case,  instead  of  a  total  of  ^  j'  majority-logic  gates  [5] 

/-o 


when  Hamming  codes  were  originally  treated  as  (m  — l)-step 
majority-logic  decodable  codes  [6]. 

*  The  (vrf,  bd,  rd,  k'd,  /^-configurations  [7]  with 


parameters, vd  =  §  > 


(  F\ 


k'=J,X  =  J- 1,  where 


£  >  3,  3  <  J  <  |_£  /  2_|,  constructs  a  class  of  (n,  k,  d)  = 
(£\  fE'] 

(  +£,  ,4)  SEC-DED  codes  which  can  be  decoded  by 

UJ  {jj 

one-step  threshold  decoding  with  the  threshold  value  T=J, 
where  t^=n-L 


A  comparison  of  some  threshold  decodable  codes  with 
existing  majority-logic  decodable  codes,  and  a  list  of  multiple- 
error-correcting  threshold  decodable  codes,  is  given  in  [4]. 
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Abstract  —  New  trellis  encoders  over  various  lattice 
partitions  having  optimum  distance  profile  (ODP) 
and  large  constraint  lengths  are  constructed.  They 
are  attractive  to  use  in  combination  with  sequential 
decoding  algorithms  since  their  ODP  property  en¬ 
sures  good  computational  performance. 

I.  Introduction 

Trellis  coded  modulation  (TCM)  can  achieve  significant  cod¬ 
ing  gains  over  uncoded  transmission  without  any  bandwidth 
expansion.  For  error  rates  of  the  order  of  10-6,  the  gap 
between  the  Shannon  limit  and  uncoded  high-rate  QAM- 
signaling  is  approximately  9  dB,  being  the  maximum  achiev¬ 
able  coding  gain  for  any  coded  modulation  scheme  operating 
in  this  region.  A  perhaps  more  realistic  performance  limit  is 
the  computational  cut-off  rate,  R0,  beyond  which  the  average 
number  of  computations  for  sequential  decoding  becomes  un¬ 
bounded.  The  possible  coding  gain  under  the  Ro-c riterion  is 
7.5  dB.  It  can  be  separated  in  two  parts,  viz.,  fundamental 
coding  gain  and  shaping  gain  [l] .  The  maximum  values  of 
these  gains  are  approximately  6  dB  and  1.5  dB,  respectively. 

The  signal  constellation  can  be  viewed  as  a  set  of  2n+1 
points  from  an  infinite  TV-dimensional  lattice  A.  A  sublattice 
A7  of  A  induces  a  partition  A/Af  of  A  into  IA/A'1  cosets  of  Af . 
The  output  of  a  rate  Rc  =  convolutional  encoder  is  used 
to  select  one  of  the  2fc+1  cosets.  Then  the  n  —  k  uncoded  bits 
select  one  of  the  points  in  the  specified  coset.  The  fundamental 
coding  gain  is  determined  by  the  convolutional  encoder  and 
the  lattice  partition,  whereas  the  shaping  gain  depends  on  the 
bounding  region  of  the  constellation. 

The  aim  of  this  work  is  to  increase  the  fundamental  coding 
gain  compared  to  current  systems  by  increasing  the  number 
of  states  in  the  encoder.  The  decoding  is  performed  with  a 
sequential  decoder  since  its  decoding  effort  is  essentially  in¬ 
dependent  of  the  number  of  states.  It  is  well-known  that  the 
code  should  have  an  optimum  distance  profile  (ODP)  in  or¬ 
der  to  minimize  the  average  number  of  computations  for  the 
sequential  decoder. 

II.  Search  for  ODP-encoders 

It  is  convenient  to  search  for  Rc  =  encoders  on  a  syste¬ 
matic  feedback  form.  The  corresponding  generator  matrix  is 

G(D)  =  (h  \  H'(D)/H°{D))  ,  i  =  k  , 

where 

Hi{D)  =  K  +  h[D  +  ---  +  hiDv 

are  the  parity  check  polynomials  in  the  delay  operator  D.  The 
search  is  performed  as  follows: 

Assume  that  the  set  of  ODP-encoders  of  constraint  length  v 
is  known.  Form  the  2fc+1  possible  extensions  of  every  encoder 
within  this  set  and  calculate  their  distance  profiles.  Retain 

1This  research  was  supported  in  part  by  the  Swedish  Research 
Council  for  Engineering  Sciences  under  the  Grants  92-661  and  94- 
83. 


the  encoders  with  the  best  distance  profile,  these  form  the  set 
of  ODP-encoders  of  constraint  length  v  -f-  1. 

Later  on,  we  will  require  the  encoder  to  be  on  a  feedforward 
form.  The  transformation  from  a  systematic  rational  to  a 
non-systematic  polynomial  encoding  matrix  is  performed  as 
follows: 

The  encoding  matrices  G(D)  and  Gi(D )  are  equivalent  if 
Gi(D)  =  T(D)G(D)  and  T(D)  nonsingular.  If  T(D)  is  chosen 
to  be  h  *  Gi(D)  has  a  feedforward  realization. 

Let  Gi(D)  =  A(D)T(D)B(D)  be  the  Smith  factorization 
of  Gi(D).  Choosing  G2(D)  as  the  k  upper  rows  of  B(D) 
ensures  that  G2(D)  is  basic  and  equivalent  to  G(D).  Using 
the  algorithm  in  [2],  we  are  now  able  to  construct  a  minimal- 
basic  matrix  Gz{D)  that  is  equivalent  to  G(D). 

III.  Performance  evaluation 

The  distance  spectrum  of  the  encoders  are  computed  with 
the  FAST  algorithm  [3],  which  is  considered  to  be  very  effi¬ 
cient.  However,  it  requires  knowledge  of  the  smallest  number 
of  steps  needed  to  drive  the  encoder  from  a  certain  state  to 
the  zero  state.  This  is  very  difficult  to  compute  for  an  encoder 
with  feedback,  but  trivial  for  a  feedforward  encoder,  which  is 
the  reason  for  the  encoding  matrix  transformation.  The  com¬ 
plexity  of  the  matrix  transformation  is  small  compared  to  the 
increase  in  number  of  node  visits  that  would  occur  if  another 
search  algorithm  would  be  employed. 

In  the  table  we  give  ODP-encoders  over  7L2 f2R7L2  maxi¬ 
mizing  the  effective  coding  gain,  Jeff,  a,t  an  error  rate  of 
10-6.  Following  [1],  we  compute  the  three  first  components 
of  the  distance  spectrum,  TV*.  The  dominant  error  coefficient 
is  starred,  and  the  parity  check  polynomials  are  given  in  octal 
notation. 

We  will  continue  to  search  for  large  constraint  length  ODP- 
encoders  over  4-  and  8-dimensional  lattice  partitions  and  per¬ 
form  simulations  of  their  performance. 
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Summary 


Binary  block  codes  have  been  extensively  used  for  error  detec¬ 
tion,  and  amongst  the  most  popular  we  have  the  CRC  (cyclic 
redundancy  check)  codes  [1,  5].  In  contrast,  convolutional 
codes  have  been  used  almost  exclusively  for  error  correction 
(the  exception  being  some  hybrid  applications).  A  reason  for 
this  is  that  the  most  popular  convolutional  codes  have  a  rate 
that  is  far  too  low  (e.g.,  1/2,  2/3)  for  the  cases  where  only 
error  detection  is  desired.  Furthermore,  while  there  has  been 
a  variety  of  algebraic  methods  to  design  good  high-rate  block 
codes,  the  design  of  good  high-rate  convolutional  codes  seems 
to  be  more  difficult.  Nevertheless,  progress  has  been  made  in 
that  field  [2],  which  opens  the  practical  possibility  of  using 
convolutional  codes  for  error  detection. 

Clearly,  one  crucial  requirement  for  the  use  in  (only)  error 
detection  is  low  decoding  complexity.  We  show  that  for  this 
specific  application  the  decoding  complexity  of  convolutional 
codes  is  practically  equal  to  the  coding  complexity,  which  is 
very  small.  Thus,  the  encoder/decoder  can  be  implemented 
directly  in  hardware  (as  exemplified  in  Fig.  1),  or  use  efficient 
software  decoding  techniques  like  those  used  for  CRC  error  de¬ 
tection  codes  [5].  Different  encoder /decoder  implementations 
are  considered. 

By  studing  the  properties  of  high-rate  convolutional  codes 
for  the  purpose  of  error  detection,  we  show  their  potential 
advantages  over  block  codes.  For  instance,  one  fundamental 
limitation  of  block  codes  for  burst  error  detection  comes  from 
the  fact  that  the  decoder  can  only  flag  an  error  at  the  end  of 
each  data  block.  Consequently,  there  is  conflict  between  the 
minimization  of  the  probability  of  undetected  errors,  and  the 
minimization  of  the  average  error  detection  delay — which  is 
the  amount  of  time  taken  by  the  decoder  to  flag  an  error  after 
it  occurred.  To  minimize  the  delay  it  is  necessary  to  use  short 
blocks,  but,  for  a  given  code  rate,  short  block  codes  may  not 
be  powerful  enough  to  detect  long  bursts.  The  convolutional 
codes  can  be  powerful  enough  to  detect  those  error  bursts, 
and  still  flag  the  errors  with  small  delays. 

In  addition,  this  study  gives  a  deeper  view  of  CRC  codes — 
which  happen  to  be  a  special  case  in  a  class  of  codes  that  we 
call  unit-rate  convolutional  codes.  Thus,  for  the  extension  of 
CRCs  we  can  employ  techniques  used  for  convolutional  codes, 
like  the  use  of  unit-memory  [3,  2]  or  cyclic  time- varying  codes. 

Certain  general  error  detection  capabilities  of  the  convolu¬ 
tional  codes  are  derived,  as  shown  in  the  example  below. 
Proposition  1  The  fraction  FU(L,  n,  k ,  m)  of  error  patterns 
with  duration  L  not  detected  by  a  (n,  k ,  m)  convolutional  code 


iThis  work  was  supported  by  CNPq,  Conselho  Nacional  de  De- 
senvolvimento  Cientifico  e  Tecnologico,  Brazil. 


FU(L ,  n,  k,  m) 


< 


0, 


2k  -  1 


(2n  - 1)2  2 
(2*  -  l)2  2 
(2n  -  l)2 


n(m— 1)  5 
k(L—m— 2) 
^n(L-2)  » 


L  <  m, 

L  —  m- hi, 

L  >  m  -f  1. 


As  explained  above,  CRC  codes  can  be  considered  special 
convolutional  codes,  and,  as  expected,  FU{L ,  n,  k,  m)  gives  the 
performance  a  ( nc ,  kc)  q- ary  CRC  code  when  n  =  k  —  1,2*  = 
q ,  and  m  =  nc  —  kc-  A  more  detailed  analysis  of  particular 
codes,  based  on  worst-case  scenarios  [4],  can  also  be  used  to 
analyze  the  performance,  or  define  code  design  objectives. 


Fig.  1  -  Encoder/decoder  implementation  for  systematic 
codes:  x  is  a  vector  with  k  data  bits,  and  y  contains  n  —  k 
parity- check  bits;  the  matrices  Pt  and  H »  have  dimensions 
k  x  k  and  k  X  (n  —  k),  respectively. 
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Abstract  —  The  table-based  soft- decision  convolu¬ 
tional  decoding  method  presented  here  performs  a 
reduced  tree  search  as  compared  to  the  M-algorithm. 
The  degree  of  tree-searching  is  adapted  to  the  state 
of  the  channel  by  using  a  syndrome  sequence  and  pre¬ 
computed  information  stored  in  a  memory  table.  This 
results  in  a  significant  reduction  in  computational 
complexity  while  maintaining  bit  error  rate  perfor¬ 
mance  comparable  to  the  M-algorithm  on  a  Rayleigh 
flat-fading  channel. 

I.  Table-Based  Algorithm 

We  restrict  the  presentation  to  rate  one-half  convolutional 
coding,  although  the  algorithm  presented  may  be  ex¬ 
tended  to  other  coding  rates.  Let  the  encoded  sequence 
be  v  =  (41),42).«il)>1,i2)>-'-)  =  (vo.v^vj,...),  where 
Vi  =  Let  the  corresponding  received  se- 

quence  of  real-valued  (soft)  symbols  at  the  receiver  be  r  = 
(Jol)>ro2\741)>742)>---)  =  (r0,ri,...)-  The  sequence  r  may 
be  hard-quantized  (sign  detector)  to  generate  a  binary  re¬ 
ceived  sequence  b  =  (b^\bQ2\b[l\b[2\  . . .)  =  (b0,bi,...). 
The  data-independent  syndrome  sequence  s  =  (so,  Si,  «2»  •••) 
is  defined  as  s  =  bHT,  where  H  is  the  parity  check  matrix. 
A  section  of  the  syndrome  sequence  S[tj  t+r]  is  generated  from 
b[t_V)  £+t]  where  v  is  the  constraint  length  of  the  encoder. 

In  the  M-algorithm,  at  each  time-step  f,  each  of  the  M 
paths  pj^  t_1p  1  <  i  <  M,  from  time  t  -  1  is  extended  with 
both  branch  extensions  in  the  code  tree  to  form  a  total  of 
2 M  paths  from  which  the  best  M  paths  are  chosen  [1].  The 
table-based  algorithm  stores  Mc  paths  at  any  given  time  with 
Mc  <  M  and  differs  from  the  M-algorithm  as  follows  [2]: 

For  each  path  p^  a  syndrome  sequence  s[^t+7_i]  *s  calcu¬ 
lated.  If  s[^t+7_p  =  0,  p|q|  t  —  \]  is  extended  °nly  with  one 
branch  extension  p^  =  bt,  i.e.,  no  additional  paths  are  gen¬ 
erated.  If  s[t°f+7_1]  ^  0,  a  finite  section  of  is  used  to 
retrieve  a  memory  table  entry  that  indicates  if  a  single  branch 
extension  must  be  considered  with  p^  =  bt  as  above  or  if 
both  branch  extensions  of  pj^  i_1-j  must  be  considered.  If 
S[t(  £+7_i]  =  0  for  the  path  p  with  the  best  metric,  the  other 
Mc  —  1  paths  are  discarded,  and  the  best  path  is  simply  ex¬ 
tended  with  the  received  symbols  (b£,  b£+i , . . .)  until  the  next 
non-zero  syndrome  bit  occurs.  In  this  stage,  Mc  =  1  and 
the  algorithm  is  in  a  depth-first  search  mode  until  the  next 
nonzero  syndrome  bit  occurs. 

II.  Performance 

A  framed  system  with  interleaving  similar  to  the  IS-54  North 
American  digital  cellular  standard  is  used,  with  F  =  84  infor¬ 
mation  bits  and  v  —  5  tail  bits  in  each  frame.  A  Rayleigh  time 
correlated  flat-fading  model  is  used  for  the  channel.  At  the 
receiver,  ideal  estimation  of  the  fading  coefficients  is  assumed. 
Simulations  were  performed  for  the  M-algorithm  (M  =  8),  the 

1  Havish  Koorapaty  is  with  the  Dept,  of  Electrical  and  Computer 
Engineering.  His  work  was  supported  in  part  by  Ericsson  Inc. 


Viterbi  algorithm  and  the  Table-based  algorithm  (syndrome 
length  /3  —  15,  M  —  8,  7  =  8).  Figure  1  shows  the  decoded 
information  bit  error  rates  for  the  three  algorithms  and  Figure 
2  shows  the  average  number  of  paths  per  time-step  for  which 
both  branch  extensions  were  considered  which  is  representa¬ 
tive  of  the  reduction  in  computation. 
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We  analyze  the  relationship  between  the  MDL  (Mini¬ 
mum  Description  Length)  estimator  (posterior  mode)  and  the 
P.B.E.  (projected  Bayes  estimator)  for  exponential  families, 
where  the  P.B.E.  is  obtained  by  projecting  the  B.E.  (Bayes 
estimator,  i.e.  posterior  mixture)  onto  the  original  exponen¬ 
tial  family  and  is  equal  to  the  B.E.  under  a  certain  condition. 

An  exponential  family  is  defined  as  5(0)  =  {p(x\0)  = 
exp(0l;r;  —  ^(6))\0  6  0}  (the  range  of  random  variable  x  is 
A  £  JP1),  where  0  is  a  connected  subset  of  W1  and  9lX{  de¬ 
notes  TT,,  9lXi.  6  is  called  the  canonical  parameter  or  the 
^-coordinates.  We  also  define  the  expectation  parameter  77  (77- 
coordinates)  as  rji  =  Ee(xi).  6  and  77  form  a  dual  pair  from  the 
point  of  view  of  information  geometry  [1].  Let  gl*  denote  the 
Fisher  information  matrix  with  respect  to  77.  We  define  gij  = 
Ee((xi-r)i)(xj-T]j))  and Tijk  -  Ee((xi-rn)(xj-r]j)(xk—nk))- 
Note  that  gij  equals  the  inverse  of  g %* .  We  refer  to  a  function 
/  which  maps  Ui==0|i,...  ^ *  to  ^  (any  set  probability  distri¬ 
butions)  as  an  estimator.  We  let  f[xN]  denote  the  image  of 
xN  by  /  an<J  f[xN](x)  denote  its  density  at  x.  Hereafter,  we 
let  77  denote  the  maximum  likelihood  estimate  (MLE)  for  77, 
and  T  and  g  denote  their  values  at  77  =  77  respectively.  Finally, 
we  let  ‘In’  denote  the  natural  logarithm. 

We  define  the  B.E.  with  the  prior  w(9)d9  as  fw[xN](x) 
=  pw(x\xN)  =  f  p(x\9)w(9\xN)d9,  where  w(9\xN)  de¬ 
notes  the  posterior  density  of  0.  We  let  wj  denote 
the  Jeffreys  prior  (oc  (det  \gij  |)1/2).  Among  the  B.E.’s 
fwj  is  particularly  important,  since  it  is  known  ([2]) 
that  sup^©  D(p('\9)\\fw[x1]))  ( D  denotes  Kullback- 

Leibler  divergence)  is  asymptotically  minimized  when  w  = 
wj,  i*e.  fwj  has  minimax  property.  We  define  the  pro¬ 
jection  of  fw  to  5(0)  (let  fw  denote  it)  as  fw[xN]  = 
argminp€5(©)  D(fw[xN]\\p).  We  refer  to  fw  as  the  P.B.E.  with 
the  prior  w.  Define  77  =  J  v(9)w(9\xN)d9,  then  we  can  show 

Vi(fw[xN])  =  i)i  under  a  certain  weak  condition.  Using  this 
fact,  we  can  show  the  following  for  fw . 

Theorem  1  Under  a  certain  weak  condition ,  r)i(fw  [*"])  = 
Pi  +N~ld  In  w{9) / d9x  +0(W“3/2v/hT/V)  holds.  When  w(9) 
is  uniform  over  0,  77 z(fw[xN])  =  rji  +  0(e~CN)  holds. 

Corollary  1  Under  a  certain  weak  condition ,  Pi{f l\xN\)  = 
rji  +fijkgjk/2N  +0(N~3/2VfoN)  holds. 

The  MDL  estimator  with  respect  to  prior  w(9)d9  is  defined 
as  9mdi  =  arg  min^e©  (-  In 73(0^ |0)- In  u;(0)-Pln  det  \gab{9)\/2) 
and  fmdi[xN]  =  p{'\Qrndi)  ([3,  5]).  When  w(9)d9  oc  c/77,  we  let 
fmdi  denote  fZdi-  We  show  the  following  for  f™dl. 

Lemma  1  7 n(fZdilxN])  =  Vi  +N~1dlnw(9)/d9i 
—Tijkgjk/2N  +0(N~2)  holds.  In  particular,  Vi(fZdi\xN\)  ~ 
Vi  +Tijkgjk/2N  +0(N~2)  holds. 

We  let  f£c  denote  the  bias  corrected  MLE  with  respect  to  9 
(see  for  example  [1]).  Concerning  the  expectation  parameter 
of  /£.,  we  can  show  the  following. 

^c/o  C&C  Res.  Labs.,  NEC  Corp.  e-mail  tak@SBL.CL.necxo.jp 
2)RWCP:  Real  World  Computing  Partnership 


Lemma  2  77 i(fSc[xN])  =  rp  fijkgjk/2N  +0(N~2)  holds. 

We  can  show  the  following  theorems  using  Theorem  1, 
Corollary  1  and  Lemmas  1,2. 

Theorem  2  Under  a  certain  weak  condition,  the  differences 
between  ri(f^dl[xN]),  r){fWJ[xN]),  and  v(fL[xN])  are  of  order 
0((lnN)1/2N~3/2). 

Theorem  3  Under  a  certain  weak  condition,  the  differences 
between  v(fZdi[xN})’  v(fde[xN]),  and  77  are  of  order  0(1/ N2). 

We  summarize  the  above  two  theorems  in  Table  1,  where  we 
ignore  o(l/N)  terms.  These  results  suggest  a  striking  symme- 


prior  wd9 

d9 

yj det  \gij\dQ 

di 7 

/. 

77-unbiased 

0-unbiased 

fZdi 

77-unbiased 

^-unbiased 

Table  1:  dependency  of  estimators  on  prior 

try  between  the  two  estimators. 

We  exhibit  an  example  (Bernoulli  sources)  below.  Define 
5  =  {p(x \v)  =  77*(1  -  v)^~x  |0  <  77  <  1}  (x  =  0,1),  and 
0  —  In  (77/ (1  -  77)).  In  this  case,  77 (fWJ  [ar^])  =  (A:-f  0.5)/(JV+l) 
holds,  which  is  well  known  as  the  Laplace  estimator.  Next,  we 
derive  the  MDL  estimator.  The  total  description  length  for 
MDL  with  respect  to  77  is  —k  In  77  —  (N  —  k)  ln(l  —  77)  —  (In  77  + 
ln(l  —  77) )/2  =  —(k  +  0.5)  In  77  —  (N  —  A:  +  0.5)  ln(l  —  77).  This  is 
minimized  when  77  =  (k  +  0.5)/(AT  4-  1),  which  strictly  equals 
v(fwj[xN ]).  Finally,  we  derive  the  bias- corrected  MLE.  Since 
Tin  =  Eq((x  —  rf)z)  —  77(1  —  77) (1  —  277),  we  have  Tin#11  =  1  — 
=  l—2k/N.  Hence,  we  have  776c  =  k/N+(l  —  (2k/N))/2N+ 
0(\/N2)  =  v(fwj[xN])  +  0(1/N2). 

Theorem  2  implies  that  we  can  approximate  the  P.B.E. 
with  Jeffreys  prior  (which  is  hard  to  derive  in  general)  simply 
by  deriving  an  appropriate  MDL  estimator  or  a  bias-corrected 
MLE.  Some  important  topics  of  future  research  are  as  follows: 
To  analyze  the  difference  between  the  B.E.  and  the  P.B.E. 
and  to  evaluate  directly  the  performance  of  fWj ,  fffdv  and 
fj/c.  The  argument  in  this  abstract  is  restricted  to  the  case  in 
which  the  class  of  sources  is  an  i.i.d.  exponential  family.  The 
same  problem  for  Markov  sources  is  discussed  in  [4],  We  would 
also  like  to  analyze  the  case  for  curved  exponential  families. 
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Abstract  —  We  apply  the  method  of  complexity 
regularization  to  learn  concepts  from  large  concept 
classes.  The  method  is  shown  to  automatically  find 
the  best  balance  between  the  approximation  error  and 
the  estimation  error.  In  particular,  the  error  probab- 
jlity  of  the  obtained  classifier  is  shown  to  decrease  as 
O(y/\ognjn)  to  the  achievable  optimum,  for  large  non- 
parametric  classes  of  distributions,  as  the  sample  size 
n  grows. 

In  pattern  recognition — or  concept  learning — the  value  of  a 
{0, 1}- valued  random  variable  Y  is  to  be  predicted  based  upon 
observing  an  II d- valued  random  variable  X.  A  prediction  rule 
(or  decision )  is  a  function  :  Hd  {0, 1},  whose  performance 
is  measured  by  its  error  probability  P {4>(X)  ±  Y}.  The  error 
probability  L *  =  P{#*(X)  /  Y}  of  the  optimal  decision  g*  is 
called  the  Bayes  risk.  Assume  that  a  training  sequence 


Theorem  1  Let  C(1) ,  C(2) , . . .  be  a  sequence  of  classes  of  clas¬ 
sifiers  whose  VC  dimensions  Vi,  V2, . . .  are  finite.  Let  (f>n  be 
the  classification  rule  based  on  structural  risk  minimization. 
Then  for  all  n , 

E (L(«)}  -  L* 

<  iRf  (1 1614  log  n  +  8_(fc±ll)+/  farf  LW-L’V). 

~  k>  1  \  V  n  \4>£C(k)  )  J 

This  result  is  close  on  spirit  of  those  obtained  by  Barron 
[l],  and  Barron  and  Cover  [2],  who  select  a  classifier  from 
a  countable  list  of  candidates  by  minimizing  the  sum  of  the 
empirical  error  and  a  properly  chosen  penalty.  A  significant 
difference  is  that  the  method  we  study  here  does  not  restrict 
the  search  to  a  countable  set  of  candidates,  allowing  thus  better 
approximation  ability. 


Dn  =  ((Xl,Y1),...,(Xn,Yn)) 

of  independent,  identically  distributed  random  variables  is 
available,  where  the  (X;,Y*)  have  the  same  distribution  as 
(X,  Y),  and  Dn  is  independent  of  (X,  Y).  A  classifier is  a  func¬ 
tion  <t>n:Hdx  (Hd  X  {0,  l})n  {0, 1},  whose  error  probability 
is  the  random  variable  L{(j>n)  =  P{<£n(X,  Dn)  ^  Y|Dn}. 

The  method  of  empirical  risk  minimization  picks  a  clas¬ 
sifier  from  a  class  C  of  functions  Hd  — >  {0, 1}  that  min¬ 
imizes  the  empirical  error  probability  over  C.  More  pre¬ 
cisely,  define  the  empirical  error  probability  of  a  decision  <j> 
by  Ln{<i>)  =  (l/n)£:=i  where  I  denotes  the  in¬ 

dicator  function.  Let  </>n  denote  a  classifier  chosen  from  C  by 
minimizing  Ln (</>),  he.,  Ln(4>n)  <  Ln (<£),  <f>  £  C.  Vapnik  and 
Chervonenkis  [4],  [5]  proved  distribution-free  exponential  in¬ 
equalities  for  empirical  error  minimization.  One  of  the  implic- 
ations  is  that  EL(<£„)  -  inf^gc  L(<j>)  <  c^Vlog  n)/n,  where 
V  is  the  VC  dimension  of  the  class  C  and  c  is  a  universal 
constant  (independent  of  the  distribution).  Thus,  the  error 
probability  of  the  empirically  chosen  decision  is  always  within 
0(v/log  n/n)  of  that  of  the  best  in  C.  Unfortunately,  if  V  <  00, 
then  for  some  distributions,  inf^^c  L(</>)  may  he  arbitrarily  far 
from  L *  On  the  other  hand,  if  V  =  00,  then  L(4>n)—  inf^gc  L(<f>) 
will  be  large  for  some  distributions  [3],  [5]. 

A  possible  solution  to  this  problem  may  be  derived  from  the 
idea  of  structural  risk  minimization  (Vapnik  and  Chervonenkis 
[5]),  also  known  as  complexity  regularization  (see  Barron  [1], 
Barron  and  Cover  [2]).  The  basic  idea  is  to  minimize  the 
sum  of  the  empirical  error  and  a  term  corresponding  to  the 
“complexity”  of  the  candidate  classifier.  In  our  application, 
this  complexity  is  a  simple  function  of  the  VC  dimension  of 
the  class  from  which  the  candidate  classifier  is  taken. 
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Corollary  1  Let  C ^ , . . .  be  a  sequence  of  classes  of  clas¬ 
sifiers  such  that  the  VC  dimensions  Vi,V2,...  ctre  all  finite. 
Assume  further  that  the  Bayes  rule  is  contained  in  the  union 

of  these  classes,  i.e.,  g *  €  C*  =*  U .  Let  K  be  the  smal¬ 
lest  integer  such  that  g *  6  C(K) .  Then  for  every  n,  the  error 
probability  of  the  classification  rule  based  on  structural  risk 
minimization,  (f>n>  satisfies 


EL(«)  -  L*  <  4 


VK  log  n  +  K/  2  +  6 
n 


Corollary  1  shows  that  the  rate  of  convergence  is  always  of 
the  order  of  ^/log  n/n,  and  the  constant  factor  Vk  depends 
on  the  distribution.  The  number  Vk  may  be  viewed  as  the 
inherent  complexity  of  the  Bayes  rule  for  the  distribution.  One 
great  advantage  of  structural  risk  minimization  is  that  it  finds 
automatically  where  to  look  for  the  optimal  classifier. 
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Abstract  —  In  this  paper,  minimax  expected  redun¬ 
dancies  over  memoryless  source  classes  of  smooth  den¬ 
sities  are  studied,  through  their  connections  with  ac¬ 
cumulated  prediction  errors  and  using  available  tech¬ 
niques  from  nonparametric  statistics.  To  derive  lower 
bounds  on  the  minimax  expected  redundancy  rates, 
two  methods  are  used  and  compared.  One  is  the  As¬ 
souad’s  technique  from  statistical  density  estimation 
and  the  other  is  the  information-theoretic  (general¬ 
ized)  Fano’s  inequality.  Both  methods  are  applied 
to  hypercube  sub-classes  and  a  connection  between 
Assouad’s  and  Fano’s  is  established  using  a  packing 
number  result  from  error-correcting  coding  theory. 
Finally,  optimal  (rate)  codes,  which  achieve  the  min¬ 
imax  rate  lower  bounds  on  expected  redundancy,  are 
formed  based  on  optimal  density  estimators. 

Summary 

Minimax  expected  redundancy  was  studied  as  early  as  1973 
by  Davisson  [3]  for  Markov  sources.  For  other  regular  para¬ 
metric  source  classes,  minimax  lower  bounds  on  expected  re¬ 
dundancy  follow  from  [2]  (see  also  [7]).  Lower  bounds  on 
expected  redundancy,  minimax  or  Rissanen’s  pointwise  one 
([8]),  play  an  important  role  in  Rissanen’s  Stochastic  Com¬ 
plexity  Theory  since  they  justify,  together  with  codes  achiev¬ 
ing  these  lower  bounds,  the  complexity  measures  of  the  source 
classes.  While  Rissanen’s  pointwise  lower  bound  has  difficulty 
extending  to  nonparametric  (or  smooth)  classes  of  densities, 
the  minimax  approach  has  its  natural  counterpart  for  those 
classes.  The  minimax  expected  redundancy  rates  measures 
the  complexities  of  the  nonparametric  source  classes,  just  as 
in  the  regular  parametric  case. 

For  a  given  memoryless  data  string  xn  =  (zi ,  x2, xn)  and 
without  knowing  the  density  /  on  [0,1]  which  generated  the 
data,  we  would  like  to  compress  the  data  in  an  efficient  way; 
that  is,  we  would  like  to  find  a  joint  density  qn  on  the  n-tuples, 
which  may  be  regarded  as  corresponding  to  a  prefix  code  with 
code  length  —  logg„(a;n)  for  an  n-tuple  xn ,  such  that  its  ex¬ 
pected  redundancy  is  small  over,  say,  source  classes  of  smooth 
densities.  On  the  other  hand,  the  expected  redundancy  of  qn 
can  be  decomposed  into  accumulated  prediction  or  estimation 
error  Ejt-i  D(f,  qt-i)  because  our  source  is  memoryless 
with  density  /  on  [0,1].  The  estimation  error  Efi-\  D(f,  qt-i) 
in  terms  of  information  divergence  D  is  very  much  related  to 
other  errors  which  correspond  to  real  distances  such  as  the 
Bellinger  distance 


Density  estimation  errors  in  terms  of  Hellinger  distance  have 
been  well  studied  in  the  statistical  density  estimation  litera¬ 
ture  (cf.  Birge  [l])  for  classes  of  smooth  densities.  Therefore, 

1  This  work  was  partially  supported  by  ARO  Grant  D  A  AH04- 
94-G-0232  and  NSF  Grant  DMS-9322817. 


to  obtain  minimax  lower  bounds  on  redundancy  over  the  same 
type  of  smooth  density  classes,  we  may  borrow  techniques 
from  density  estimation. 

One  well-known  technique  is  Assouad’s  method.  It  bounds 
the  minimax  estimation  error  from  below  by  the  average  es¬ 
timation  error  over  a  sub-class,  i.e.,  by  the  Bayes  estimation 
error  corresponding  to  the  uniform  prior  on  the  sub-class.  This 
sub-class  is  indexed  by  a  hypercube  whose  dimension  can  be 
optimalized  in  the  end.  More  importantly,  the  sub-class  is 
chosen  in  such  a  way  that  the  Hellinger  distances  of  densities 
on  the  neighboring  vertexes  of  the  hypercube  can  be  calcu¬ 
lated  easily.  This  Assouad’s  method  can  also  be  understood 
through  another  useful  and  well-known  technique  called  Le 
Cam’s  method  which  deals  with  two  sets  of  hypotheses,  (cf. 
[9]  and  [10]). 

The  other  (more  powerful)  technique  is  the  generalized 
Fano’s  inequality  (cf.  [4]  and  [6]),  which  deals  with  finite  num¬ 
ber  of  hypotheses.  Using  a  packing  number  result  from  the 
error-correcting  coding  theory  ([5]),  a  sub-set  of  the  vertexes 
of  the  hypercube  class  can  be  selected  to  apply  the  Fano’s  in¬ 
equality  to,  giving  the  same  rate  lower  bounds  on  redundancy 
as  Assouad’s  (cf.  [9]). 

However,  since  minmax  and  summation  do  not  exchange, 
lower  bounds  on  redundancy  do  not  follow  directly  from  the 
lower  bounds  in  density  estimation,  but  require  a  separate 
Assouad’s  type  of  arguments. 

As  to  upper  bounds,  when  the  density  /  is  bounded  away 
from  zero,  the  minimax  rate  lower  bounds  on  expected  redun¬ 
dancy  can  be  achieved  if  we  take  qt  as  any  optimal  rate  density 
estimator  based  on  the  first  t  observations.  Hence  the  min¬ 
imax  rates  in  the  lower  bounds  are  the  optimal  redundancy 
rates  over  classes  of  smooth  densities. 
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Abstract  —  The  algorithm  for  designing  a  pattern 
classifier,  which  uses  MDL  criterion  and  a  binary  data 
structure,  is  proposed.  The  algorithm  gives  a  parti¬ 
tioning  of  the  space  of  the  if -dimensional  attribute 
and  gives  an  estimated  probability  model  for  this  par¬ 
titioning.  The  volume  of  bins  in  this  partitioning  is 
asymptotically  upper  bounded  by  O((\og  N  /  N)  ^  ) 

for  large  N  in  probability,  where  N  is  the  length  of 
training  sequence.  The  redundancy  of  the  code  length 
and  the  divergence  of  the  estimated  model  are  asymp¬ 
totically  upper  bounded  by  G(K(\ogN/ N)2^h+2^).  The 
classification  error  is  asymptotically  upper  bounded 
by  0{K1/2(\ogN/N)1/(K+2)). 


I.  Introduction 

Pattern  classification  is  a  problem  of  assigning  each  data  at¬ 
tribute  X,  which  is  typically  obtained  from  measuring  in¬ 
struments,  a  label  Y  which  indicates  the  class  that  data 
belongs  to[l].  Suppose  that  the  label  Y  assumes  a  value 
y  in  a  binary  set  y  =  {0,1}  according  to  a  probability 
distribution  Py(y)  and  the  observed  attribute,  denoted  by 
X  =  (Xi  assumes  a  value  x  =  (®i,  ®2,  *  *  * ,  xk) 

in  a  subset  ^  =  [0,1)A.  The  optimal  decision  rule  is  ex¬ 
pressed  as  y  =  f(x)  =  argmaxy€y  Py|x(y|*)-  Thus,  the 
problem  of  designing  a  pattern  classifier  turns  out  to  be  the 
problem  of  estimating  Py\x(y\x)  from  a  Siven  traininS  se" 
quence  {{Xi,Yi),i  =  1,- •  •  ,N}  of  length  N.  We  assume  that 
(Xi,Yi)  are  independent  and  identically  distributed. 

In  the  case  of  discrete- valued  attributes,  Quinlan  and 
Rivest[2]  first  showed  the  possibility  of  applying  Rissanen’s 
Minimum  Description  Length  principle[3]  to  the  construction 
of  decision  trees  for  the  pattern  classification  problem. 

We  propose  an  algorithm  based  on  MDL  two  stage  coding 
to  design  a  pattern  classifier  using  a  sequence  of  independent 
training  examples.  A  binary  tree  structure  is  used  to  represent 
the  partition  and  MDL  criterion  is  used  to  optimize  the  tree. 
The  asymptotic  performance  is  derived. 

II.  Algorithm 

Our  strategy  is  as  follows:  For  one- dimensional  continuous 
valued  X,  its  range  X  ==  [0, 1)  is  partitioned  into  finite  s  sub¬ 
sets  called  bins  i  =  1,  •  •  • ,  5,  and  then  the  probability  model 
PY\x(y\x)  is  obtained  by  the  histogram- like  estimator.  This 
approach  is  used  by  Rissanen[4].  However,  the  complexity 
of  the  estimated  model  soon  becomes  excessively  large  unless 
the  model  complexity  is  appropriately  controlled.  Here,  we 
restrict  the  partitioning  such  that  |6* |  =  2"  *,i  =  l,***,s, 
where  di  €  JV .  With  this  restriction,  bins  are  represented  as 
leaf  nodes  of  a  complete  binary  tree.  Let  t  =  tit2  ■  * 4  td  be  a 
path,  string  of  edges  of  length  d  in  the  tree  leading  from  root 
node  to  a  leaf.  With  each  leaf  node  represented  by  a  path  t, 
we  associate  a  bin  b(t)  =  [O.tOO  •  •  • ,  O.tll  •  *  •)•  ket  be  the 
cost  of  the  leaf  node  t,  that  is,  the  code  length  associated  with 


bin  bt.  The  minimization  of  the  sum  of  the  costs  can  be  done 
easily  with  the  dynamic  programing  which  uses  the  recursive 
structure  of  the  binary  tree[5]. 

The  binary  tree  structure  is  extended  to  K- dimensional 
attribute  case  if  we  let  each  node  t  =  tut2i  •  *  mtx  1^12^22  *  ** 
th'2ti3  ■  * '  represent  a  bin  b(t)  —  {(®i,  x2,  *  *  *  2 A') I  xi  € 

b(t1),x2  €  b(t2),  •  •  • , xK  €  b(tK  )},  where  tl  = tut2y *  *• 

The  computational  time  complexity  of  the  algorithm  is  dra¬ 
matically  improved  over  that  of  [4]. 


III.  Main  Result 

Theorem:  Assume  that  px(x)  Py\x{y\x)  are  uPPer 
lower  bounded  and  their  first  differentials  are  upper  bounded, 
then  we  have 


\b(t{N))\  <  C 


^  log  N  ^ 


G.5., 


^-J H(PY\x(0\x))Px(x)dx<°(^K  (~jp)  )  ’a•5•, 

and 


/  Y  Pvlx(y\xnog 

J  X _ n  1 


y =0, 1 


pyjxiy]^) 

Py\x(y\x) 


PX  {x)dx 


for  all  sufficiently  large  IV,  where  |M*(N))|  is  the  volume  of 
the  bin  associated  with  the  node  t  and  L  is  the  code  length. 
The  resubstituted  classification  error  R(N)  of  our  classifier 
satisfies 

I R(N)  -  iCinl  <  O  (k*  (^)  ^ 

for  all  sufficiently  large  JV,  where  Rm in  is  the  Bayes  risk. 


References 

[1]  L.  Breiman,  J.  H.  Friedman,  R.  A.  Olshen,  and  C.  J.  Stone, 
Classification  And  Regression  Trees.  Belmont,  California: 
Wadsworth  Inc.,  1984. 

[2]  J.  Quinlan  and  R.  L.  Rivest,  “Inferring  Decision  Trees  Using 
the  Minimum  Description  Length  Principle,”  Inf.  and  Comp., 
vol.  80(3),  pp.  227-248,  1989. 

[3]  J.  Rissanen,  “Universal  Coding,  Information,  Prediction, 
and  Estimation,”  IEEE  Trans.  Inform.  Theory ,  vol.  IT-30, 
pp.  629-636,  July  1984. 

[4]  Rissanen  J.  and  Yu  B.:  “MDL  Learning”,  Progress  in  Au¬ 
tomation  and  Information  Systems,  Springer -Verlag,  New  York, 
1991. 

[5]  S.  Itoh,  “A  Piecewise  Linear  Approximation  of  Planar  Curves 
by  MDL  Modeling,”  in  Proceedings  of  1993  IEEE  Information 
Theory  Workshop ,  pp.  34-35,  June  1993. 


231 


An  Extension  on  Learning  Bayesian  Belief  Networks 
Based  on  MDL  Principle 

Joe  Suzuki 

Dept  of  Mathematics,  Osaka  University, 

Toyonaka,  Osaka  560,  Japan 


Abstract  —  Bayesian  belief  network  (BBN)  is 
a  framework  for  representation/inference  of  some 
knowledge  with  uncertainty  [1],  Since  the  process 
of  constructing  a  BBN  manually  by  experts  is  time- 
consuming  in  general,  some  method  supporting  the 
task  is  needed.  We  proposed  an  algorithm  for  ac¬ 
quiring  some  BBN  automatically  from  finite  examples 
based  on  minimum  description  length  (MDL)  princi¬ 
ple  [2].  This  paper  addresses  an  improvement  which 
relaxes  a  constraint  that  the  original  scheme  held  on 
the  representation. 

In  BBNs,  attributes  and  stochastic  dependences  between 
them  are  expressed  as  nodes  and  directed  links  connecting 
them,  respectively,  where  each  attribute  may  be  a  predicate, 
a  numerical  data,  etc.,  and  each  dependence  is  numerically 
expressed  as  the  conditional  probability  of  one  attribute  given 
other  attributes  if  their  dependence  exists.  Therefore,  in  gen¬ 
eral,  BBNs  are  represented  in  terms  of  the  network  structure 
and  the  conditional  probabilities. 

Suppose  that  we  have  N  possible  attributes  j  =  1,2,---,  JV 
(N  >  1),  where  each  attribute  value  ranges  over  A[j]  = 
{0,1,  •••,<*[?]  -  1}  (2  <  a[j]  <  oo),  and  also  that  we  in¬ 
duce  the  network  structure  g  £  G  of  a  BBN  from  n  ex¬ 
amples  x i  =  Z1X2  •  •  •  Xm  where  zt  =  (z*,i ,  £*-,2,  •  ♦  • ,  x^jv), 
xi,j  €  A  [j],  j  =  1,  2,  •  •  * ,  N,  i  =  1, 2,  •  •  • ,  7i,  and  a  set  of  the 
possible  network  structures,  G,  is  prepared.  The  problem  is 
to  determine  the  set  n[j,g]  C  {1,2,  •,j  —  1}  (xfl,^]  =  <j>, 

g  £  G)  of  attributes  which  each  attribute  j  =  1,2,  ■  •  ■ ,  IV 
depends  on,  provide  that  the  N  attributes  have  been  ar¬ 
ranged  in  such  an  order  that  the  directed  dependence  is  valid 
[1].  Then,  the  number  of  the  possible  network  structures  is 

igi = n£i 

If  we  applied  MDL  principle  to  this  problem,  a  possible 
description  length  LA(g ,  x”)  based  on  network  structure  g  £  G 
would  be  [2] 


nA(x?\g)  + 


*A(9) 

2 


N 

log  ^;  +  E  lS(2’9)llog 

j=i 


[r(i/2)]aW 

mm] 


+la{g) 


except  some  constant  terms,  where  S[j,  g]  = 

HA(xi\ g)  and  kA(g)  are  respectively  the  empirical  entropy 
and  the  number  of  the  conditional  probabilities  to  be  fixed, 
and  /g(^)  is  the  description  length  of  model  g  £  G.  In  the 
original  scheme  [2],  the  network  structure  g  £  G  that  mini¬ 
mizes  LA(g ,  x")  is  selected  from  n  examples  so  that  the  best 
compromise  between  the  complexity  of  the  network  and  the 
fitness  of  the  n  examples  to  the  network  is  achieved  in  terms 
of  the  description  length. 

The  description  length  LA(g,Xi)  is  optimal  in  the  sense 
that  the  redundancy  E$[LA(g,  x”)  +  logp(x”|0)]  is  asymptot¬ 
ically  upperbounded  by  the  optimal  minimax  redundancy  ex¬ 
cept  the  length  of  /g(^)  for  g  £  G  when  the  source  0  generat¬ 
ing  the  data  x”  £  (JXyLi  A\j])n  is  expressed  as  one  of  those 


BBNs,  where  p(x™\6)  is  the  probability  of  x ™  £  (JX^Li  ^[i])n 
given  source  0.  However,  we  should  note  that  in  some  cases 
where  some  a[j]  is  large  or  A[j]  takes  continuous  values  for 
an  attribute,  j  =  1,2,  •••,7V’,  the  j- th  node  is  not  con¬ 
nected  to  any  other  nodes  even  when  the  dependence  is  ac¬ 
tually  significant.  So,  we  propose  such  an  extended  scheme 
that  the  alphabet  A[j]  is  clustered  into  another  alphabet 
B[j,g]  =  {0,1,  •••,/?[>,<,]  -  1}  (1  <  P[j,g]  <  «[?']),  where 
y  n  y'  =  <f>  for  any  j i  ^  y'  E  B\j,g ]  and  Uj ,€b[>i9]3/  =  A[ j],  and 
we  implement  a  similar  procedure  for  such  a  new  alphabet 
B[j,g\,  f°r  j  =  1,2,  •••,  JV  and  g  £  G.  In  most  cases,  such 
a  clustering  procedure  is  manually  done  as  a  pre-process  for 
both  learning  and  inferece  processes. 

Therefore,  the  structure  g  £  G  refers  to  the  clustering 
structure  B[j,  g]  as  well  as  the  network  structure  7c[j,g\,  for 
j  =  1,2,---,  in  the  proposed  scheme.  The  counterpart 
LB(g,x ?)  of  LA{g,x?)  is 


HB(x?\g)+ 


kB(g) 

2 


N 

i°s^+X^T(-7'>3)ii°g 

3=1 


where 


[r(i/2)]^’9) 

mi,g]m 


+h(g) , 


(i) 


nB(x?\  g) 


E 


E  E 

t €TU,g)  y€B[j,g) 


m[y,t,j]  log{|y| 


m_ 

m[y,t,j]  +  1/2 


and  kB(g)  =  (/?[?>  9]  -  1)  II*e.r[>,g]  Plk’  T l>'»s]  = 

n*6jr[>  9]  B[k,g],  and  rn [t,  j]  and  denote  the  occur¬ 

rence,  in  y?  G  (njli  B[j,g])n,  of  t  €  FLe^y.®]  B[k,  g]  and 
that  of  y  G  B[j,g]  given  t  €  ILeM!.®]  B[k,g],  respectively, 
for  j  =  1,2,---,  -AT  and  g  £  G.  Note  that  the  same  length 
-log{|y|(m[t,  j]  +  p[j,g]/2)/(m[y,t,j]  +  1/2)}  is  assigned  to 
the  |$/|  symbols  in  a  group  y  £  B[j,  g],  assuming  that  they 
occur  equiprobably. 

Theorem  1:  The  redundancy  Ee[LB(g,  x")  +  logp(x"|0)] 
is  asymptotically  upperbounded  by  the  optimal  minimax  re¬ 
dundancy  except  the  length  of  /g(<?)  for  g  £  G  when  the  source 
0  ranges  over  the  BBNs  in  which  the  elements  in  A[j\  can  be 
clustered  into  any  exclusive  groups  for  J  =  1,2,  •••,7V,  and 
the  elements  in  the  same  group  occurs  equiprobably. 

Theorem  2:  The  number  of  the  possible  structures  in 
the  proposed  scheme  is  |G|  =  2N^N~1^2  ]Xj=i  /(<*[.?])>  wkere 


f(°)=£± 


•=1  3  =  1 


(i-j)W  ' 


References 

[1]  J.  Peral,  Probabilistic  Reasoning  in  Intelligent  Systems,  Morgan 
Kaufmann,  San  Mateo,  CA  (1988). 

[2]  J.  Suzuki.  “Learning  Bayeesian  Belief  Networks  Based  on  the 
Minimum  Description  Length  Principle”,  submitted  to  IEEE 
Trans,  on  Information  Theory  (1993). 


232 


Optimal  Universal  Learning  and  Prediction  of  Probabilistic  Concepts 

Meir  Feder,  Yoav  Freund  and  Yishay  Mansour 

Department  of  Electrical  Engineering  -  Systems,  Tel  Aviv  University;  AT&  T  Bell  Labs,  Room  2B-428,  Murray  Hill  NJ; 
Department  of  Computer  Science,  School  of  Mathematics,  Tel  Aviv  University. 


I  Introduction 


We  consider  the  following  setting  of  the  (supervised)  learning 
problem.  A  sequence  of  input  data  xi, . .  ■ ,  ®t»  •  •  •  j  is  given,  one 
by  one,  and  the  goal  is  to  predict  the  corresponding  outputs 
2/1  j  ■  ■  •  ^  yt,  •  •  It  is  assumed  that  at  time  t  the  predictor  has  an 
access  to  the  previous  input-output  pairs  (x^,  2/*)*=o’  an<^  then 
it  has  to  predict  yt  given  the  new  input  xt.  The  input  and 
output  are  connected  by  some  unknown  functional  relation, 
given  by  a  conditional  probability  distribution  pe( y\x),  where 
it  is  only  known  that  9  belong  to  some  general  index  set  0. 

The  prediction  outcome  is  an  estimated  probability  distri¬ 
bution  qt(-)  =  for  the  unobserved  value  yt . 

(The  notation  x\  means  xt,...,Xj).  When  yt  is  revealed,  a 
prediction  “log-loss”  -log  qt{yt)  is  incurred.  The  goal  is  to 
minimize  the  expected  accumulated  log-loss,  for  the  entire  se¬ 
quence  of  decisions,  Ee  {^t-i  —  l°g^t(s/t)}  >  where  the  expec¬ 
tation  is  with  respect  to  the  “true”  distribution  Pe(yi\x\)- 

Since  0  is  unknown,  we  wish  to  find  a  universal  predictor, 
independent  of  9.  We  note  that  the  simpler  problem  of  pre¬ 
dicting  yt  with  log-loss,  in  the  absence  of  the  input  x”,  is  com¬ 
pletely  equivalent  to  universal  coding.  As  is  became  evident, 
from  recent  results  in  universal  coding,  the  optimal  prediction 
is  based  on  a  Bayesian  approach  in  which  a  “mixture  prob¬ 
ability  measure  Q(y\)  =  f0ee  w(dO)Pe(yi)  is  assigned  to  the 
observation  sequence.  The  “prior”  w(d9)  is  chosen  to  attain 


sup 


1  ”(<")  ?  pm)  ioe  /,,»(»■«.«)' 


(i) 


i.e.,  to  achieve  the  capacity  of  the  “channel”  between  0  and 
and  Yf1.  The  prediction  at  each  time  point  is  given  by 
qtiytlyl-1)  =  Q{y[)IQ(yrlY  A  classical  result  in  univer¬ 
sal  coding  [1]  states  that  this  encoder  attains  the  min-max 
redundancy,  implying  that  the  associated  predictor  minimizes 
the  maximal  extra  accumulated  log-loss.  Recently,  [2]  it  was 
shown  that  the  performance  of  this  predictor,  given  by  the 
capacity  (1),  is  a  lower  bound  on  the  performance  of  any  uni¬ 
versal  coder,  in  the  sense  that  any  other  encoder  cannot  have 
a  smaller  redundancy  (or  a  smaller  excess  log-loss)  for  “most 

9  ee. 

Our  proposed  solution  for  the  supervised  learning  problem 
is  likewise  Bayesian,  and  the  contribution  of  this  work  lies  in 
determining  the  optimal  way  to  choose  the  Bayesian  prior 
for  the  superviesd  learning  problem,  and  observing  the  strong 
sequential,  non-anticipating,  structure  of  the  resulting  univer¬ 
sal  predictor. 


predictor  will  be  based  on  a  Bayesian  universal  probability,  in 
which  the  weights  depend  on  x",  and  each  such  probability 
will  attain  a  capacity  C(xi)  of  the  channel  between  0  and 
Y™>  given  that  x”.  However,  this  solution  turns  out  to  be  un¬ 
acceptable  because  it  leads  to  too  pessimistic,  or  “too  careful 
prediction  procedures.  This  is  because  we  try  to  minimize  the 
extra  loss,  for  the  worst  9,  and  that  for  each  Xi .  In  addi¬ 
tion  the  resulting  w(d9)  depends  on  the  entire  x” ,  and  so  the 
Bayesian  mixture  probability  does  not  factor  into  a  sequential 
assignment. 

We  overcome  these  drawbacks  by  postulating  a  prob¬ 
ability  distribution,  ^(xj1),  over  the  input.  This  is  a 
common  assumption  in  many  learning  problems,  where 
at  least  in  the  training  stage,  the  input  is  indeed 
randomly  chosen  according  to  some  pre-defined  distri¬ 
bution.  The  extra  average  accumulated  log-loss,  is 

now  R(8,  Q)  =  -£,.**(*?)£,- 

Here,  also,  the  solution  to  minQ  max*  R(9,  Q),  is  the 
max-min  solution  which  (almost  by  definition)  is  giv¬ 
en  by  the  Bayesian  mixture  Q(yi, . . . ,  yn\xi ,  *  ♦  •  >  xn)  = 
fe(.e  w(d9)Pe(yu  • . .  ,yn\xu  •  •  -  ,xn)  where  w{d9)  is  a  weight 
over  0,  independent  of  x”,  which  maximizes  the  conditional 
mutual  information, 

/(©,  Y\\Xy)  =  f  w(d9)'^2fl(x1)'Y^Pe(yi  |^1  )  1°S  Q(  n|xn)‘ 

J  $  _n 

X1  yl 

(2) 

The  quantity  sup^  7(0,  =  Cn  can  be  interpreted  as 

the  capacity  of  an  “auxiliary  channel”  between  0  and  Y” , 
with  side  information  X?.  Similarly  to  [2]  we  prove  that  Cn, 
which  is  the  loss  incurred  by  our  Bayesian  predictor,  cannot 
be  improved  by  any  other  predictor,  for  “most”  9  e  0- 

For  prediction  we  need  a  sequential  probablity  assign¬ 
ment.  First,  we  observe  that  the  universal  probablity 
above  can  always  be  factored  as  Q(yi,  • .  •  ,yn\xi,  ♦*  ■  >£n)  = 
nr-i^l^r1^!).  Furthermore,  under  the  common  as¬ 
sumption  in  learning  theory  Pe{yi\^i)  =  YYt=i 
this  case  the  universal  probability  can  actually  expressed  as 
Q(yu---,yn\xu...,x„)  =  nr=i  9{yt\y\~\x\)  and  so  it  pro- 
vides  a  fully  sequential,  non-anticipating  prediction  procedure. 

Finally,  since  Cn  is  an  attainable  lower  bound  on  the 
performance  of  any  learning  and  prediction  algorithm,  we 
make  the  following  claim:  A  class  of  conditional  distributions 
{pe(y\x)y9  e  0}  is  learnable  if  and  only  if  Cn/n  -+  0,  as  the 
data  length  n  — *  oo.  Thus,  Cn  can  replace  other  measures, 
such  as  these  of  Vapnik  and  Chervonenkis,  to  determine  the 
complexity  of  a  class  of  models. 


II  Optimal  Universal  Learning 

In  our  problem,  where  a  side  information  x”  is  given,  one  may 
try  to  generalize  the  universal  coding  results  in  the  following 
way.  Since  x”  is  known  (or  will  be  known  as  we  predict  the 
output),  an  optimal  predictor  may  be  chosen  for  each  x”.  This 
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Abstract  —  We  survey  important  developments  in 
the  theory  of  covering  radius  during  the  period  1985- 
1994.  We  present  lower  bounds,  constructions  and  up¬ 
per  bounds,  the  linear  and  nonlinear  cases,  density 
and  asymptotic  results,  normality,  specific  classes  of 
codes,  covering  radius  and  dual  distance,  tables,  and 
open  problems. 


I.  Background 

Interest  in  covering  radius  has  grown  markedly  since  about 
1980.  The  topic  has  applications  to  problems  of  data  compres¬ 
sion,  testing,  and  write-once  memories.  It  is  also  interesting 
for  its  own  sake.  It  is  a  fundamental  geometric  parameter  of 
a  code,  characterizing  its  maximal  error  correcting  capability 
in  the  case  of  minimum  distance  decoding.  Although  some  of 
these  applications  are  recent,  others  are  old.  Yet  after  the  1960 
paper  of  Gorenstein,  Peterson,  and  Zierler  [4]  showing  that  the 
double-error-correcting  binary  BCH  code  has  covering  radius 
3,  (though  there  were  some  papers  on  the  football-pool  pro¬ 
blem),  there  was  nothing  on  covering  radius  until  the  seminal 
paper  [3]  of  Delsarte  in  1973. 

An  earlier  survey,  [l],  published  in  1985,  has  seemingly 
contributed  to  the  increase  in  the  number  of  papers  on  this 
topic  in  the  last  decade.  Covering  radius  has  evolved  into  a 
subject  in  its  own  right,  and  we  feel  the  need  to  give  a  sum¬ 
mary  of  many  works  on  covering  codes  that  have  appeared 
since  [1]. 


Section  7  deals  with  specific  classes  of  error  correcting 
codes,  among  which  are  Reed-Muller,  BCH  and  their  duals, 
cyclic,  self-dual,  and  algebraic-geometric  codes. 

Section  8  is  a  brief  account  of  relations  between  covering 
radius  and  dual  distance. 

Section  9,  on  generalizations  of  coverings,  treats  mixed, 
weighted,  and  multiple  coverings. 

In  Section  10  we  discuss  the  open  problems  of  [l],  add  two 
new  ones,  and  disprove  a  conjecture. 

We  provide  extensive  tables  of  bounds  for  coverings. 

In  our  bibliography  of  some  270  items  we  have  tried  to  in¬ 
clude  all  papers  bearing  on  the  covering  radius  of  block  codes. 
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II.  Plan  of  the  paper 

We  discuss  lower  bounds  in  Section  2,  mentioning  several 
methods  but  especially  linear  programming  and  the  method  of 
excess.  These  methods  usually  improve  on  the  sphere-covering 
bound. 

In  Section  3  we  discuss  asymptotic  density  of  coverings 
when  the  length  goes  to  infinity  while  the  radius  remains  fixed. 

In  Section  4  we  treat  upper  bounds  for  linear  codes,  fo¬ 
cusing  on  the  deficiency  of  a  code,  “worst”  codes  (useful  in 
designing  write-once  memories),  and  Griesmer,  optimum,  and 
maximum  codes. 

Section  5  discusses  upper  bounds  obtained  from  construc¬ 
tions.  There  are  blockwise  direct  sums,  amalgamated  direct 
sums,  variants  on  the  u|u4-t'  construction,  and  simulated  an¬ 
nealing.  This  section  closes  with  codes  over  mixed  alphabets. 

In  Section  6  we  discuss  normality  and  some  of  its  many 
offshoots,  closing  with  the  conjecture  A'(n-b2,  *  +  1)  <  K(n,t). 
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We  get  a  B-ordering  of  all  binary  ^-tuples  Yn 
by  choosing  an  ordered  basis  {yi,...,yn}  of  Vn  and 
ordering  the  n-tuples  as  follows:  0,  y],  y 2,  y2+yi,  J3, 
J3+JU  y3+y2>  y3+y2+yb  Mv  Given  a  minimum 
distance  d ,  choose  a  set  of  vectors  S  with  the  zero 
vector  first,  then  go  through  the  vectors  in  their  B- 
ordering  and  choose  the  next  vector  which  has  distance 
d  or  more  from  all  vectors  already  chosen.  The 
surprising  result  that  S  is  linear  has  been  shown  in 
several  different  ways  [1,  2,  3,  4,  6].  Linear  codes  found 
in  this  fashion  are  called  greedy  codes. 

An  ordered  basis  {y/}  of  Y n  is  called 
triangular  [1]  if  y;  =  (0,...,  0,  1,  *,...,  *),  with  the  1  in  the 
zth  position.  When  the  y;  are  unit  vectors,  the  order  is 
the  lexicographic  order.  The  columns  hn ,  hw_i„..hi  of 
the  g-parity  check  matrix  Hn  are  constructed  one  by 
one.  We  associate  numbers  with  their  binary 
representations.  We  let  h]  be  the  number  1.  Lety;+i  = 
(0,...,  0,  1,  £j),  where  the  £;  are  0  or  1.  If  H/  = 

[hi,. ..hi]  is  known,  we  letP  be  the  smallest  number  so 
that  h/+i  =f3  +  (£/h/  +  \h\)  is  not  a  sum  of  d- 1 

or  fewer  columns  of  H;  .  Then  H/  +]  =  [h/,+  ] 

Each  H/  is  a  parity  check  matrix  of  the  greedy  code 
chosen  using  the  ordered  basis  {yi,...,y/}  [1].  Further, 
the  syndrome  of  any  vector  with  regard  to  H/  is  the  g- 
value  which  is  assigned  to  it  by  generalizing  the  greedy 
algorithm  for  choosing  vectors  in  the  code,  hence  the 
name  g-paritv  check  matrix. 

The  non-binary  case  has  also  aroused  quite  a 
bit  of  interest.  One  may  generalize  the  concept  of  B- 
ordering  to  the  case  of  an  arbitrary  base  field.  For 
example,  in  the  case  GF(4)  =  {0,  1,  CO,  CO  }%  the  B- 
ordering  is  generated  by  choosing  an  ordered  basis 
{>Tv..,y«}  of  Vn  and  ordering  the  n-tuples  as  follows:  0, 

y u  ®ybl!±yby%y2+yby2*oyby2+®y\*  coy^ 

&>y2+yh  COy2+COy\J  COyj+COy],  CO y7,  COy 9+yi, 
®J2+Q>y\  6l?w9  6) vi  ....  The  greedy  code  is  then 
generated  from  the  B-ordering  as  in  the  binary  case.  It 
has  been  shown  by  Conway  and  Sloane  in  the  case  of 
lexicodes  [2],  and  independently  by  Fon-Der-Flaass  [3], 
and  Van  Zanten  [6]  in  the  case  of  general  greedy  codes 
that  those  codes  for  which  the  base  field  is  of  order 

2 2  is  linear.  When  the  base  field  is  not  of  order  2r , 
the  situation  is  a  little  less  clear.  In  general,  the  greedy 
codes  generated  in  this  case  have  been  linear  only  for 
small  n.  In  every  case  examined,  linearity  breaks  down 
at  some  point  early  in  the  generation  of  the  code.  It  is 
possible,  however,  to  extend  the  parity  check  matrix 

This  work  was  supported  in  part  by  NSA  grant  MDA 
904-9  l-H-0003. 


generating  algorithm  to  this  case.  Although  this 
algorithm  does  not  produce  the  greedy  code  itself,  it 
still  produces  a  very  good  code  which  is  generated  in  a 
greedy-like  fashion. 

The  parity  check  matrix  is  generated  in  the 
same  way  as  in  the  binary  case.  This  algorithm  also 
assumes  that  the  ordered  basis  {y/}  of  Yn  being  used  is 
triangular,  and  that  the  first  non-zero  entry  in  each  basis 

vector  is  1.  Then  if  H/  =  [h . h  1  ]  is  known,  we  let/^ 
be  the  smallest  number  so  that  h('+l  =P  +  (e/h;  + 
•+elhl)  is  not  a  linear  combination  of  d- 1  or  fewer 
columns  of  H / .  Then  H 

Many  interesting  codes  are  generated  via  the 
parity  check  algorithm.  We  have  generated  many  such 
parity  check  matrices  via  the  computer  for  base  fields  of 
orders  3,  4,  and  5.  In  all  examined  cases,  the  codes 
generated  have  had  dimension  within  1  of  the  best 
known  codes,  for  a  given  n  and  d,  and  most  of  the  codes 
generated  had  dimension  equal  to  that  of  the  best 
known  codes.  Better  yet,  we  have  generated  more  than 
100  record  breaking  codes  over  the  base  field  of  order  4 
[5].  Most  of  these  are  shortened  codes  of  larger  greedy 
codes.  The  following  table  lists  the  parameters  of  the 
codes  from  which  the  shortened  codes  are  derived. 

Table.  Parameters  of  record  breaking  codes  over  GF(4) 
obtained  via  the  parity  check  matrix  algorithm. 

n _ k _ d 

52  44  5 

128  118  5 

35  26  6 

71  60  6 
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Abstract  —  This  papers  presents  4  methods  of  gen¬ 
erating  a  Lee  distance  Gray  code.  The  first  two  meth¬ 
ods  presented  are  for  radix  k  numbers,  and  the  other 
two  methods  are  for  mixed  radix  numbers. 

I.  Introduction 

Some  recently  developed  parallel  machines  have  a  multi¬ 
dimensional  torus  topology  for  their  processor  interconnection 
structure.  Many  algorithms  can  be  solved  efficiently  by  em¬ 
bedding  a  Hamiltonian  cycle  or  a  Hamiltonian  path  within 
this  topology.  This  correspondence  addresses  the  embedding 
problem  by  presenting  four  methods  of  constructing  a  Lee  dis¬ 
tance  Gray  code.  For  each  method,  Let  R  =  (rn-irn-2  ■  •  *  ro) 
be  a  number  in  radix  notation,  and  let  G  =  ( gn-ign-2  •  *  ♦  go) 
be  the  Gray  code  representation  given  by  ft,  i.e.,  G  =  /t(R). 

II.  Single  Radix  Codes 

First,  assume  there  are  n  dimensions,  each  having  the  same 
number  of  processors,  k,  where  k  >  3.  Each  proces¬ 
sor  node  is  labeled  with  a  distinct  ft-digit,  radix  k  vector 
(rn_irn_2  •  •  •  ro),  where  n  <  k  for  0  <  i  <  n  —  1.  Two 
nodes,  A  =  {an-ian-2  •  •  •  flo)  and  B  =  (bn-ibn-2  •  *  *  &o),  are 
adjacent  if  the  Lee  distance  between  them,  Dl( A,  B),  is  one. 
Lee  distance  is  defined  as 

n — 1 

Z>l(A,  B)=  Y2  min(at  —  h  -  a,-). 

t-0 

Two  methods  are  given  below  for  constructing  a  Gray  code 
base  on  the  assumption  of  the  previous  paragraph. 

Method  Is 

fi  (rn~i  rn-2  •  *  •  r0 )  =  rn_i  (r„_2  -  )  ■  -  ■  (r0  -  n ) 

Method  2:  This  method  produces  a  Hamiltonian  cycle  if  k 
is  even,  and  a  Hamiltonian  path  if  k  is  odd.  Let  rj  =  fc  —  1  — rt  , 
and  let  gn- i  =  rn_ i.  Then,  for  i  =  n  —  2, . . . ,  0, 
if  k  is  even  then 

_  j  r,',  if  rt+i  is  even 

1  77,  otherwise 

or,  if  k  is  odd,  let  rf  =  +  1  r3,  and 

_  Jr;,  if  r  is  even 

1  77,  otherwise 


1This  work  is  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  Grant  MIP-9404924. 


III.  Mixed  Radix  Codes 

In  many  cases,  however,  the  number  of  processors  per  dimen¬ 
sion  varies.  Let  K  =  kn- \  kn- 2  •  •  •  ko  be  an  ft-dimensional 
vector  where  ki  is  the  radix  of  dimension  i  and  ki  >  3  for 
0  <  *  <  71  ~~  1*  In  this  case,  Method  3  gives  a  Gray  code  de¬ 
sign  resulting  in  a  Hamiltonian  cycle  if  ki  is  even  for  at  least 
one  value  of  i.  If  each  ki  is  odd,  the  resulting  Gray  code  pro¬ 
duces  a  Hamiltonian  path.  Method  4  produces  a  Hamiltonian 
cycle  if  all  A; ;  are  odd. 

Let  each  processor  node  be  labeled  with  a  distinct  ft-digit 
vector  R  =  (rn-irn_2  •  •  •  ro),  where  0  <  r;  <  ki  —  1  for  i  = 
0, 1, . . . ,  n  —  1.  Vector  R  is  said  to  be  in  mixed-radix  notation , 
and  the  integer  value  of  R  is  given  by 

/( R)  =  r0  +  ri  £0  +  rvkoki  -f - f  r„_iA;oA;i  . . .  kn^2 

n  —  1  /  i-l  \ 

=  e  r-  n  kJ )  +  r° 

*=i  \  3=0  J 

In  mixed-radix  notation,  the  Lee  distance,  Dl(A,  B),  be¬ 
tween  A  =  (fln_ian-2  ■  ■  •  ao)  and  B  =  (6n_i6n_2  *  •  *  bo)  is 
defined  as 

n— 1 

Dl( A,  B)  =  ^  min((a;  —  6;)  mod  ki,  (6;  —  a*)  mod  kt). 

t=0 

Method  3:  Assume  that  at  least  one  of  the  fc;’s  is  even. 
Without  loss  of  generality,  assume  that  the  dimensions  are 
ordered  so  that  if  ki  is  even  and  kj  is  odd,  then  i  >  j.  Let 
£  be  the  index  of  the  lowest  even  dimension.  That  is,  the 
dimensions  are  ordered  as  follows. 


kn — 1  ’  *  *  ki  ki — 1  *  •  •  ko 


Now,  letting  r;  =  ki  —  1  —  r;  and  rt-  =  r3 ,  is  defined 

as  follows. 

9n— 1  —  Tn— 1,  and 

r  ,  0  f  rt,  if  r;+i  is  even 

tor  z  —  ft  —  2  downto  £  :  gt  =  <  _  .  . 

n ,  otherwise 


for  i  =  £  —  1  downto  0:  gt 


ri ,  if  r  1  is  even 

rj,  otherwise 


Method  4:  Assume  that  ki  is  odd  for  0  <  i  <  n  —  1,  and  that 
the  dimensions  are  ordered  such  that  kn- 1  >  kn-2  >  . . .  >  ko. 
Also,  define 


Ti  ,  if  1  is  odd 

ki  —  1  —  ri  ,  otherwise 


Now,  /*,  which  produces  a  Gray  code  yielding  a  Hamiltonian 
cycle,  is  defined  as  follows. 


(-i  =  rn~  1,  and  for  0  <  i  <  n  —  2 

_  f  (ri  —  rt+i)  mod  kt  ,  if  r«+ 1  <  ki 
^*  |  ri  ,  otherwise 
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Abstract  —  The  properties  of  introduced  diffuse  dif¬ 
ference  triangle  sets  (DTS)  are  considered. 

I.  Definitions 

An  (7,  7)-set  is  a  set  E  =  {Ei ,  E2, . . . ,  E/}  where  E*  =  {crtJ  | 
0  <  j  <  7}  for  1  <  i  <  7  and  the  elements  are  integers 

such  that  <n o  —  0  for  1  <  i  <  7,  aij  <  for  1  <  i  < 

I  and  0  <  /  <  7.  Let  m(E)  —  maxjji;  |  1  <  i  <  7}  and 
**(s)  =  An  (^)-dts  (in  normalized  form)  is  an 

(7,  J)-set  E  such  that  all  the  differences  atJ  —  a^j/  with  1  < 
i  <  I  and  0  <  j'  <  j  <  7  are  distinct. 

A  diffuse  DTS  satisfies  some  additional  conditions: 

i  >  (Ttj  +  6  for  1  <  i  <  I  and  0  <  j  <  J,  (1) 

| &ij  —  c 7i>j / 1  >  6C  for  1  <  i  ^  i  <  I  and  0  <  j,j'  <  J  (2) 

except  when  j  =  j'  =  0. 

An  (/,  7)-DTS  satisfying  (1)  and  (2)  is  called  an  (7,  J,  <5,  6C)- 
DTS.  The  set  of  (I,  J ’  6,  6C)-DTS  is  denoted  by  5(7,  J,  8 ,  6C). 
We  note  that  for  I  =  1  the  condition  (2)  is  empty,  we 
will  put  8C  =  0  in  this  case.  For  applications,  which  are 
mainly  found  in  the  constructions  of  diffuse  codes  [1],  we 
want  (7,  J,  8 ,  6C)-DTS  E  with  m(E)  as  small  as  possible.  Let 
m(7, 1,8,8c)  =  min{m(E)  |  E  €  5(7,  J,  8,  8C)}.  If  m( E)  = 
m(7,  J,  8, 6C),  then  E  is  called  optimal.  Similarly,  define 
p(7,  J ;  8,  8C)  =  min{p(E)  I  E  €  5(7,  J,  8,  6C)}. 

We  will  study  here  the  structure  of  the  set  5(7,  7, 8,  8C) 
when  one  or  both  of  8 ,  8C  are  increasing  and  the  other  para¬ 
meters  are  kept  fixed. 

II.  Increasing  6 

Let  5(7,  7,  8C)  denote  the  set  of  (7,  7)-sets  T  =  {Fi ,  r2, . .  ♦ ,  T/} 
where  the  =  {7^  |  0  <  j  <  7}  are  such  that  for  each  fixed 
l ,  where  0  <  /  <  7,  all  the  differences  —  lij  with  1  <  i  < 

I  and  0  <  j  <  7  —  /  are  distinct,  and  [7^  —  7^/^ |  >  8C  for  1  < 
i  ^  1  <  I  and  1  <  j  <  7. 

Let  5(7, 7,6C)  =  min{m(r)  |  F  €  G{1, 7, 6C)}.  For  T  G 
5(7,  7, 6C)  and  6  >  0,  define  E  =  /^(r)  by  <7^  =  7,7  + 
j8  for  1  <  *  <  7  and  0  <  j  <  7.  We  note  that  m(/<5(r))  = 

m(r)  +  J8. 

Lemma  1  7/  E  G  5(7,  7,  6, 6C),  tht,n  E  =  /^(r)  /or  some  T  G 
5(7,  7,  6C). 

Lemma  2  For  each  F  G  5(7,  7,  6C)  there  exists  a  bound  80  (T) 
such  that  /$(T)  G  5(7,  7,  6,  6C)  for  8  >  50 (T) . 

Based  on  Lemmata  1  and  2,  we  give  the  following 

Theorem  1  For  given  I ,  J ,  8C,  and  (  >  0,  there  exists  a 
bound  6o(7,  7, 6C,  C)  such  that 

a)  m(7,  7, 6, 6C)  =  g(I,  7, 6C)  +  76  for  8  >  60 (7,  7,  6C,  0), 

6^  /or  6  >  6o(7,  7,  6C,  C)  we  have 

{E  G  5(7,  7, 6,  6C)  |  m(E)  =  m(7,  7, 6,  6C)  +  C} 

=  {*(r)  |  T  G  5(7,  7,  6C)  and  m(r)  -  ff(7,  7,  6C)  +  C}. 

Corollary  1  For  6  >  60 (7,  7,  6c,C);  of  {E  G  5(7,7, 

6,  6C)  1  m(E)  =  m(7,  7,  6,  6C)  +  C}  * s  independent  of  8. 


III.  Increasing  Sc 

We  can  assume  without  loss  of  generality  that  for  E  G 
5(7,  7,6,6C)  we  have  <rtj  <  <rt+i(j  for  1  <  i  <  7.  With  this 
assumption,  we  partition  the  set  5(7,  7,6,  6C)  into  two  sets: 
5i(7,  7,  6,  6C)  =  {E  G  5(7,  7,6,  6C)  \  <Tij  <  o-t+i.i  for  1  <  i  < 
7},  52(7,  7,  6,  6C)  =  5(7,  7,  6,  6C)  \  Sx  (7,  7, 6,  6C). 

For  0  G  5(7,  7—1,  6,  0)  and  non-negative  integers  61, 62, . . 
6/,  define  E  =  h(Q,  81 , 62, . . . ,  6/)  by  <7*0  =  0  for  1  <  i  < 
aij  =  Si=i  h  +  SLi  ^,^-1  for  1  <  *  <  7  and  1  < 

j  <  7.  We  note  that  m(E)  =  ^f7=1  6i  +  ^(0). 

Lemma  3  7/  E  G  5i(7,  7, 6, 6C),  f/ien  there  exist  a  0  G 
5(7,  7  —  1, 6,  0)  and  non-negative  integers  8l  >  6C  /or  1  <  *  <  7 
such  that  E  =  /i(0,  61, 62, . . . ,  6j).  7n  particular,  m(E)  > 
m(7,7-1,6,0)  +  76c. 

Lemma  4  For  each  0  G  5(7,  7  —  1,  6,  0)  there  exists  a  bound 
6co(0)  sucft  £fta£  2/6,  >  6C  >  6cO(0)  for  1  <  *  <  7,  then 
A(0,61,62,...,6/)  G  5i  (7,  7,  6,6c). 

Lemma  5  //Eg  52(7,  7,  6,  6C),  then  m(E)  >  (7  +  1)6C. 

Based  on  Lemmata  3,  4,  and  5,  we  obtain 

Theorem  2  For  given  I,  7,  and  6  there  e^sts  a  bound 
6co(7,  7, 6)  such  that  at/  6e  >  6co(7,  7,  6),  then  m(7,  7,  6,  6C)  = 
^(7,  7  —  1,  6,  0)  -j-  76c. 

Corollary  2  For  6C  >  6co(7, 7, 6,  C),  the  size  of  {E  G 
5(7,  7,  6,  6C)  |  m(E)  =  m(7,  7,  6,  6C)  +  (}  is  independent  of  6C. 

IV.  Both  6  and  8C  increasing 

Combining  Lemmas  2  and  4,  we  get  the  following  Lemma. 
Lemma  6  If  T  G  5(7,7  -  1,0),  6  >  60(F),  and  6C  > 
6co(/tf(r))f  then  E  =  h(f6(T)}  6C,  6C> . . . ,  6C)  G  5(7,7,6,6C), 
and  m(E)  =  /^(r)  +  (7  -  1)76  +  76c. 

Lemma  7  a)  If  8  <  8C,  then  m(7,  7,  6,  6C)  >  7(7  -  1)6  -f  76c 
b)  If  8  >  6C,  then  m(7,  7,  6,  6C)  >  (77  —  1)6C  4-  6. 

We  note  that  for  6C  >  6,  the  lower  bound  in  Lemma  7  a) 
and  the  upper  bound  implied  by  Lemma  6  differs  by  a  con¬ 
stant  independent  of  6  and  6C.  Based  on  this  observation  and 
support  from  numerical  data,  we  put  forward  the  following 
conjectures: 

Conjecture  1  For  given  I  and  J  there  exists  a  bounds  6(7,  7) 
and  A(7,  7)  and  a  constant  i/(1 ,  7)  such  that  m(7,  7,  6,  6C)  = 
i/(7,  7)  +  (7  -  1)76  +  76c  for  8  >  6(7,  7)  and  8C  >  8  +  A(7,  7). 

Conjecture  2  For  given  7,  7,  and  1,  there  exists  a  bound 
6(7,  7,  Z)  and  a  constant  z/(7,  7,  Z)  such  that  m(7,  7,  6,  6  +  Z)  = 
z/(7,7,Z)+776  /or6>6(7,7,Z). 

If  both  conjectures  are  true,  then  z/(7,  7,  Z)  =  z/(7,  7)  +  II 
for  Z  >  A (7,  7).  Hence,  three  of  four  conjectures  formulated 
in  [1]  are  proved. 
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[l]  presented  a  new  construction  method  for  binary 
const  ant-weight  cyclic  codes.  By  slightly  modifying 
this  method,  we  could  construct  new  constant-weight 
codes  (not  necessaryly  cyclic  ).  Furthermore,  two 
classes  of  binary  optimum  constant-weight  codes  could 
be  constructed  by  using  this  modified  method.  In  gen¬ 
eral,  we  show  that  binary  optimum  constant-weight 
codes,  which  achieve  Johnson  bound,  could  be  con¬ 
structed  from  codes  over  GF(q)  which  achieve  Plotkin 
bound. 

The  cyclic  order  of  a  =  (ao, . . . ,  aj\r-i)  E  [GF( 2)]N 
is  denoted  as  t(a),  i.e.  the  smallest  positive  integer 
t  such  that  a  —  S  (ctj  —  ( , . . » ,  a ^ — i ,  gq  ,  •  •  • ,  flf — 1 )  * 
It  is  clear  that  A(a)  —  {a,  5(a), . . . ,  }  form 

a  binary  const  ant- weight  code  with  length  N,  weight 
tu(a),  and  its  minimum  distance  is  denoted  as  d(a). 
Given  a  (n,  M,  d)  code  C  in  GF{q),  v  E  [GJ(2)p 
with  cyclic  order  g,  and  an  one  to  one  mapping  /  : 
GF(q)  — ►  A(v)  ,  denote 


c{v,  1)  =  {(/(c0),  .  .  f(cn-l)\c  =  (c0, . . . ,  c„_i)  €  C) 

Proposition  1  C{v)  f)  is  a  binary  constant- weight 
code  with  length  nN,  weight  nw(v),  minimum  distance 
d[v)d,  and  codeword  number  M . 

Proposition  2 

A2{nN,  d(v)d,  nw(v))  >  Aq{n,d) 

Construction  1  (  ref.  [l]  )  a  —  (1, 0, . . . ,  0)  E 
\GF{2)]q,  t(a)  =  q,  w(a )  =  1,  d(a)  =  2 
Construction  2  (  ref.  [1]  )  q  =  p,  prime,  and 

is  odd,  8  d=  Legendre  sequence  of  length  p, 

m = <p)  =  2y1.  m  =  e±i 

Propositions  (1)  A2{nq,2d,n)  >  Aq(n,d) 

(2)  if  p  is  prime,  and  is  odd,  then 

A2[np,d^Y^,n^~)  >  Ap{n,d) 

Lower  bounds  for  ^(ra*,  d* ,  u>)  could  be  obtained  from 
lower  bounds  for  Aq{n,d),  e.g.  Gilbert- Varshamov 
bound,  and  optimum  codes  in  GF{q))  e.g.  Hamming 
codes,  Golay  codes,  R-S  codes,  MDS  codes,  simplex 
codes. 


Proposition  4  If  C  is  a  optimum  (n,  M,  d)  code  in 
GF{q),  which  achieves  Plotkin  bound,  i.e .  M  = 
d-[n[q-i)/q]  >  d  >  n{q  -  1  )/q.  then  C(a,f)  and 

C(l31  f)  are  binary  optimum  constant-weight  codes , 
which  achieve  Johnson  bound. 

Generalized  Hadamard  matrix  in  GF(q)  could  be  used 
to  construct  codes  in  GF(q)}  which  achieve  Plotkin 
bound,  e.g.  ref. [2].  If  we  take  C  be  the  simplex  code, 
i.e.  dual  code  of  Hamming  code  in  GF(q),  we  obtain 
two  classes  of  binary  optimum  const  ant- weight  codes. 

Proposition  5  (1)  A2{q^~^,  2qm~l, 

(2)  if  p  is  prime}  and  is  odd ,  then 

Pm~l  nm— 1P+1  Pm~lp  +  U  m 

p-l  'P  2  5  p  —  1  2  P 

If  C  is  a  binary  optimum  code  which  achieves  Plotkin 
bound,  then  C[ct,  f)  is  an  optimum  blanced  error- 
correcting  code,  therefore  we  could  use  Hadamard 
matrix  to  construct  optimum  blanced  error-correcting 
codes. 
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Abstract  —  Linear  spaces  of  n  x  n  x  n  tensors  over 
finite  fields  are  investigated  where  the  rank  of  any 
nonzero  tensor  in  the  space  is  at  least  a  prescribed 
number  p.  Such  spaces  can  recover  any  n  x  n  x  n  ten¬ 
sor  of  rank  <  (/x  —  1) /2,  and,  as  such,  they  can  be  used 
to  correct  three-way  crisscross  errors.  Bounds  on  the 
dimensions  of  such  spaces  are  given  for  p  <  2n+l,  and 
constructions  are  provided  for  p  <  2n—l  with  redun¬ 
dancy  which  is  linear  in  n.  These  constructions  can 
be  generalized  to  spaces  of  n  x  n  x  •  •  •  x  n  hyper-arrays. 


II.  Bounds 

Theorem  1.  For  any  3-[tixA,  k]  hyper-array  code, 
nA-k  >  An  -  (A-l)log,(9-l)  -  0(A/(qn  log ,))  . 

Theorem  2.  Let  p  <  2ra-f-l.  Then ,  for  every  p-[nxA,k] 
hyper- array  code, 

nA  —  k  >  A[(^  —  l)/2j  n  (1  —  £a(w))  , 
where  lim„_co  ca(w)  =  0. 


I.  Introduction 

An  n  x  n  x  n  tensor  over  a  field  F  is  an  n  x  n  x  n  array 
F  =  ]”  ■  £=1  whose  entries  are  in  F .  A  tensor  T  — 

over  F  is  called  a  rank-one  tensor  if  there  exist 
three  nonzero  vectors  [a;]"=1,  [6j]”=1,  and  [ci]i=1  over  F 
such  that  =  aib3ci  for  i,j,l —  1,2,  ...,n.  The  rank  of 

an  n  x  n  x  n  tensor  T  is  the  smallest  number  p  of  rank-one 
tensors  Fm  such  that  T  =  Fm-  The  definition  of  tensor 

rank  is  a  generalization  of  that  of  matrix  rank  and  can  be 
extended  to  nxA  hyper-arrays  over  F. 

A  p-[nxA,k]  hyper-array  code  C  over  a  field  F  is  a  k- 
dimensional  linear  subspace  of  the  vector  space  of  all  nxA 
hyper- arrays  over  F  where  p  is  the  smallest  rank  of  any 
nonzero  hyper- array  in  C.  We  call  nA—k  the  redundancy  of 
C  and  p  the  minimum  rank  of  C.  We  will  use  the  terms  ar¬ 
ray  codes  and  tensor  codes  for  the  cases  A  =  2  and  A  =  3, 
respectively. 

The  minimum-rank  Singleton  bound  for  p-[nxA,k]  hyper¬ 
array  codes  over  a  field  F  takes  the  form 

nA  —  k  >  [p  —  1)  n  . 

This  bound  was  stated  by  Delsarte  in  [1]  for  the  case  A  =  2. 
Furthermore,  Delsarte  obtained  a  construction  of  p-[n  x  n ,  k] 
array  codes  over  GF(q)  that  attains  this  bound  for  every  p  <  n 
(see  also  [2]  and  [4]). 

In  [3]  and  [4],  it  was  shown  how  a  certain  model  of  errors 
—  so-called  crisscross  errors  —  can  be  handled  optimally  by 
using  such  array  codes.  A  discussion  was  given  in  [4]  also  for 
larger  A.  There  are  various  applications  of  the  crisscross  error 
model.  In  particular,  the  three-way  crisscross  model  of  errors 
in  tensors  (i.e.,  the  case  A  =  3)  can  be  found  in  practice  in 
certain  memory  chips.  Tensor  rank  is  closely  related  also  to 
the  multiplicative  complexity  of  sets  of  bilinear  forms,  such  as 
polynomial  multiplication  or  matrix  multiplication. 

The  purpose  of  this  work  is  to  continue  the  work  of  [1], 

[2],  and  [4]  and  present  constructions  of  linear  spaces  of  nxA 
hyper-arrays  for  A  >  3  while  obtaining  bounds  on  the  dimen¬ 
sions  of  such  spaces.  We  mainly  concentrate  on  bounds  and 
constructions  of  p-[nxA,  fc]  hyper- array  codes  over  finite  fields 
with  p  =  O(n). 


III.  Construction  of  tensor  codes 

Let  and  {u^}7=1  be  three  bases  of 

GF(qn)  over  GF(q).  Define  the  tensor  code  C(n,  p,3;q)  as 
the  set  of  all  tensors  T  =  over  GF(q)  such  that 


13]  “t  =  0 . 


where  r  and  s  range  over  all  nonnegative  integers  such  that 
(a)  0  <  r,  s  <  n ,  and  (b)  there  exists  a  (conventional)  linear 
[p—1,  r+1]  code  over  GF(q)  with  minimum  Hamming  distance 
s-f-l .  In  particular,  by  the  Singleton  bound  on  the  minimum 
Hamming  distance  we  have  r  +  s  <  p  —  2.  Hence,  we  obtain 
the  following  upper  bound  on  the  redundancy  of  C{n,p,  3;#): 


nz  —  k  < 


(2)  n  for  p  =  1,  2, . . . ,  n 

(2n~2M+1)  n  for  p  —  n-f  1, . . . ,  2^  —  1 


Theorem  3.  The  minimum  rank  of  C(n,  p,  3;  q)  is  at 
least  p. 

Generalizing  the  construction  for  any  A,  we  can  obtain 
p-[nxA,k]  hyper-array  codes  C(n,p,A;q)  whose  redundancy 
is  bounded  from  above  by  (M^73)  n *  For  A  =  2,  the  codes 
C(n,p,  2\q)  coincide  with  those  of  Delsarte  [l]. 

The  construction  C{n ,  p ,  A;  q)  attains  the  Singleton  bound 
when  p  —  2.  For  p  =  3  we  get  redundancy  nA—k  =  An, 
which,  in  view  of  Theorem  1,  is  optimal  over  GF{2)  for  any 
fixed  A  and  sufficiently  large  n.  In  general,  for  any  fixed  p,  the 
redundancy  of  C(n,  p ,  A;  q)  is  linear  in  n,  which  is  smaller  than 
a  redundancy  proportional  to  n  log^  n  that  would  be  needed 
in  the  simpler  skewing  crisscross  coding  method. 
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Abstract —  Based  on  the  average  weight  distribution  of  linearly 
expanded  codes,  we  study  their  asymptotic  characteristics. 

I.  Introduction 

In  this  paper,  we  study  the  asymptotic  properties  of  linearly  expanded 
(LE)  maximum  distance  separable  (MDS)  codes.  We  show  that  there 
is  a  class  of  LE  MDS  codes  in  which  most  members  are  asymptotically 
good.  A  time-varying  code  is  also  discussed,  based  on  the  asymptotic 
goodness  of  LE  MDS  codes. 

II.  Average  Weight  Distribution 
The  average  weight  distribution  (AWD)  of  LE  codes  is  defined  as 
follows.  Pick  any  ( N,K,D )  block  code  C  over  GF(qm),  and  list 
all  nonzero  TV-tuples  over  the  multiplicative  group  of  GF(gm).  With 
each  iV-tupIe,  multiply  the  columns  of  the  code,  which  yields  a  total 
of  ( q m  -  l)jV  block  codes  over  GF(gm).  Finally,  expand  each  code 
with  a  fixed  basis,  to  obtain  a  class  Cx  of  (n,k)  q- ary  LE  codes, 
where  n  =  mN  and  k  —  mK.  Consider  now  the  q-ary  weight 
i,0  <  i  <  mN.  Let  Gi  denote  the  average  number  of  weight-i 
codewords  in  a  code  in  Cx.  We  refer  to  the  set  as  the  q-ary 

AWD  of  Cx.  The  sum  Gh~>j  =  YH=h  called  the  cumulative 
AWD  (CAWD)  of  Cx  between  the  weights  h  and  j ,  h  <  j.  Gi  has 
been  derived  for  a  class  of  generalised  Reed-Solomon  (GRS)  codes 
[2],  where  N  =  2m  —  1  and  q  =  2.  More  general  expressions  for  Gi 
and  the  CAWD,  applicable  to  any  qm-ary  MDS  code,  q  >  2,  have  also 
been  derived  [1]. 

Most  well-known  MDS  codes,  e.g.,  GRS  codes,  satisfy  2  <  K  < 
N  —  2.  For  such  codes,  the  CAWD  is  upper-bounded  by  [1], 


Go—*rnN  8  < 

miVS  j  \ 

qN+3+m(K-N)  ^  ^  ^  j 

;=o  V  *  J 

< 

qN+3+m(K-N)+mNHq(6) 

(1) 

where  0  <  6  <  (q  —  \)/q,  and  Hu(x)  is  the  u-ary  entropy  function, 
u>  2. 

III.  Asymptotic  Properties 

Let  Cx  be  the  class  of  <?-ary  LE  codes  obtained  from  a  #m-ary 
(N,  K,  D)  MDS  code,  where  2  <  K  <  N  —  2.  Let  d  denote  the 
minimum  distance  of  any  member  of  Cx. 

Theorem  1  For  any  e  >  0,  there  is  an  integer  No  >  0  such  that  a 
majority  of  codes  in  Cx  satisfy  >  1  -  ^  -  e,V7V  >  N0. 

In  other  words,  most  codes  in  Cx  are  asymptotically  good.  The  CAWD 
can  be  used  to  study  the  minimum  distances  of  the  LE  codes,  as  stated 
below. 

Proposition  1  The  smallest  q  -ary  weight  duos ,  such  that  Go^aMDS  > 
1 ,  is  a  lower-bound  on  the  minimum  distance  of  the  best  codes  in  the 
class.  The  largest  q-ary  weight  dmost,  such  that  Go^dmml  <  0.5,  is  a 
lower-bound  on  the  minimum  distance  of  most  codes  in  the  class. 

Fig.  1  shows  the  asymptotic  behaviour  of  dmost.  Here,  C  is  a  primitive 
Reed-Solomon  (RS)  code  over  GF(2m).  We  have  computed  dmos t  for 
5  <  m  <  9.  We  also  show  the  BCH  bound  for  comparable  primitive 
binary  BCH  codes.  Evidently,  dmQS t  is  asymptotically  good. 


Figure  1 :  Asymptotic  Behaviour 

IV.  Time- Varying  Code 

Time- varying  code  is  a  pseudo-random  code  based  on  all  the  members 
of  the  class  Cx.  To  explain  how  the  code  works,  let  us  index  the 
member  codes  with  consecutive  integers:  Cx  =  {Cc,/}^9  _1- 

To  encode  the  rth  block  of  information,  we  shall  use  the  code  CXy£ , 
where  £  =  r  [mod  (^m  -  1)^].  Since  most  members  of  Cx  are  good, 
we  are  likely  to  pick  a  good  code  most  of  the  time.  With  C  being 
the  same  Reed-Solomon  code  as  before,  it  has  been  found  that  the 
best  codes  in  Cx  dominate  the  overall  performance  of  the  time- varying 
code,  provided  every  member  is  decoded  to  its  true  minimum  distance 
[1]. 
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I.  Introduction 

Let  n  =  2p  be  a  positive  even  integer.  Let  GF{ 2)  de¬ 
note  the  Galois  held  of  order  2  and  Vn  the  GjF(2)-vectorspace 
(GF(2)Y\ 

So  denotes  Dirac  symbol  (<5o(e)  =  1  if  x  =  0  and.O  otherwise). 
For  any  subset.  E  of  Vn  or  of  Vp ,  the  symbol  <pE  denotes  the 
characteristic  function  of  E  in  Vn  or  Vp . 

We  distinguish  between  the  addition  in  Z,  denoted  by  0,  and 
the  addition  in  GF(2),  denoted  by  ©. 

For  any  Boolean  function  /,  the  complement  of  /  is  the  func¬ 
tion  /  0  1. 

We  denote  by  ^(z)  the  character  (— 1)*  on  GF(2).  The  Walsh 
transform  of  any  real- valued  function  <p  on  Vn  is  defined  on  Vn 
by  :  £(s)  =  y^r  v  tp(x)x(z  •  s),  where  denotes  the  usual 
dot  product  on  Vn. 

Let  /  be  a  Boolean  function  on  Vn.  We  denote  by  fx(x)  the 
function  x(f(x))-  The  Walsh  transform  of  fx(x)  is  the  func¬ 
tion  : 

fX(s)=  *(/(*)  •  «)• 

xSVn 

The  Boolean  function  /  is  called  bent  if  for  any  element  s  of 
V'h ,  fx(s)  has  absolute  value  2P  .  That  is  equivalent  to  the 
fact  that  /  is  at  maximum  Hamming  distance  from  the  set 
of  all  affine  functions  g{x)  =  a  *  x  0  e  (a  £  Vn,  c  £  GF( 2)). 
A  class  of  bent  functions  is  called  complete  if  it  is  globally 
invariant  under  any  affine  nonsingular  transformation  of  the 
variable  and  under  the  addition  of  any  affine  function. 

If  a  Boolean  function^  /  on  Vn  is  bent,  then  the  Boolean  func¬ 
tion  f  defined  by  :  fx(s)  =  2p\(f(s))  is  bent.  /  is  called  the 
"Fourier”  transform  of  /  (cf.  [2]). 

The  known  bent  functions  belong  to  the  completed  versions 
of  four  classes: 

1)  Maiorana-Mc  Farland’s  class  [2],  denoted  by  M.  It  is 
the  set  of  all  the  Boolean  functions  on  Vn  of  the  form  : 
f(x,y)  —  x  •  7r( 2/)  0  g(y)  where  x  and  y  belong  to  Vp,  t r  is 
a  permutation  on  Vp  and  g  is  a  Boolean  function  on  Vp. 

2)  Partial  Spreads  class  [2],  denoted  by  VS,  whose  elements 
are  the  sums  (modulo  2)  of  the  characteristic  functions  of  2P_1 
or  2/>“J  01  ” disjoint”  p-dimensional  subspaces  of  Vn  (’’disjoint” 
meaning  that  any  two  of  these  spaces  intersect  in  0  only,  and 
therefore  that  their  sum  is  direct  and  equal  to  Vn). 

3)  Class  V  [1]  which  is  the  set  of  all  the  functions  of  the  form  : 
f(x,y)  =  x  •  7r(*/)  0  <t>Ex  {x)(f>E2(y)  where  tv  is  any  permutation 
on  Vp  and  E\ ,  are  any  linear  subspaces  of  Vp  such  that 
■K(E2)  =  Et. 

4)  Class  C  [1]  which  is  the  set  of  all  the  functions  of  the  form  : 
f(x ,  y)  =  x  •  7r(?/)  0  <Pl{x)  where  n  is  any  permutation  on  Vp , 
L  is  any  linear  subspace  of  Vp  such  that,  for  any  element  A  of 
Vp,  the  set  7t-1(A  +  L"1)  is  a  flat. 

II.  Generalized  Partial  Spreads,  geometric 

FORMS  OF  BENT  FUNCTIONS 
Our  main  result  is  the  following: 


Theorem  1  Let  {Ei,  •  •  • ,  Ek}  be  a  family  of  p-dimensional 
subspaces  ofVn  and  mi,  *  •  • ,  m*  (positive  or  negative)  integers. 
Let  f(x)  be  a  Boolean  function  on  Vn.  Assume  that  : 

k 

0  TO,-  <l>E,(x)  =  2p_150(x)  +  f(x)  (i) 

1=1 

then  f  is  bent  and 

k 

=  2p_1^o(a:)  +  f(x).  (n) 

1  =  1 

We  denote  by  QVS  the  class  of  all  functions  which  satisfy  (i). 
We  call  (i)  a  geometric  form  of  /. 

Any  element  of  class  VS  belongs  to  class  QVS  .  Any  element 
of  class  M.  or  of  class  V  is  equivalent,  up  to  a  translation 
on  the  variable,  to  one  of  the  elements  of  class  QVS ,  or  to 
its  complement.  Thus,  it  belongs  to  the  completed  version  of 
class  QVS. 

For  any  element  /  of  class  QVS  and  any  linear  isomorphism  -0 
of  Vn,  the  functions  / oip  and  /0  1  belong  to  QVS.  However, 
class  QVS  is  not  complete. 

III.  NEW  BENT  FUNCTIONS  DEDUCED  FROM  THE 
THEOREM 

Proposition  1  Let  n  =  2p  be  any  even  integer.  Let  tv,  tv’  and 
tv”  be  three  permutations  on  Vp  such  that  tv  0  tv  ’  and  tv  0  tv” 
are  permutations.  Assume  that  their  inverses  are  7T-1  0  tv  ’~~1 
and  7T_1  0  tv  1  (respectively).  Let  e  and  y  be  two  elements  of 
GF( 2),  and  f(x,  y)  the  Boolean  function  defined  on  Vn  by  : 

f(x ,  y)  =  (a:  •  tv (y)  01)(i-  7 v  {y)  0  e)  0  ( x  •  iv(y))(x  •  tt”  (y)  0  r/)). 

Then  f  is  bent. 

We  have  checked  that  these  functions  do  not  belong  in  gen¬ 
eral  to  the  completed  class  of  Ad. 

IV.  A  NEW  CHARACTERIZATION  OF  BENT  FUNCTIONS 
The  theorem  extends  straightfully  to  a  more  general  frame¬ 
work:  let  /  be  a  Boolean  functions  on  14  ;  let  E\,  *  ♦  • ,  Ek  be 
p-dimensional  subspaces  of  Vn  and  mi ,  •  *  • ,  m*  integers;  as¬ 
sume  that 
k 

mj^E,  (x)  =  2p~18p(x)  0  /( x)  [mod  2P]  (1) 

1=1 

then  f  is  bent. 

We  have  proved ,  with  Philippe  Guillot,  that  the  class  of  those 
Boolean  functions  that  satisfy  (1)  is  that  of  all  bent  functions. 
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Abstract  —  We  show  how  a  combinatorial  optimi¬ 
zation  method,  the  noising  method,  can  be  used  for 
constructing  covering  codes. 


I.  Introduction 

The  noising  method  and  its  applications  to  some  graph  pro¬ 
blems  were  described  in  [l]  and  [2]  (see  also  [5]  and  [4]).  It  is  a 
heuristic  for  combinatorial  optimization  problems  of  the  form 
min{f(s)  :  s  E  5}.  The  elements  in  S  are  called  solutions 
and  /  is  the  evaluation  function.  A  transformation  is  any  ope¬ 
ration  transforming  a  solution  s  6  S  into  a  solution  s1  E  S. 
An  elementary  transformation  is  a  transformation  changing 
one  feature  of  s  without  changing  its  global  structure;  it  de¬ 
fines  the  neighbourhood  N(s)  of  a  solution  s  as  the  set  of  all 
solutions  s'  obtained  from  s  by  an  elementary  transformation. 

This  makes  possible  the  definition  of  an  iterative- 
improvement  method,  the  descent  method  :  from  a  current 
solution  s,  take  a  solution  s'  E  N(s).  If  /(s')  <  /(s),  take  s' 
as  the  current  solution,  otherwise  keep  s.  Iterate  this  process. 
When  no  s'  €  A(s)  is  better  than  the  current  solution  s,  a  lo¬ 
cal  minimum  is  reached  (with  respect  to  this  neighbourhood, 
i.e.,  to  this  elementary  transformation). 

The  noising  method  is  based  on  descent.  Starting  with  an 
initial  solution,  repeat  the  following  steps  : 

-  add  noise  to  the  data  (in  order  to  change  the  values  of  /). 

-  apply  the  descent  method  to  the  current  solution  for  the 
noised  data. 

For  each  iteration,  the  amount  of  noise  is  decreased  until 
it  reaches  0  in  last  iteration.  The  final  solution  is  the  best 
solution  computed  during  the  process. 


To  add  noise,  we  give  to  each  vector  z  E  F£  a  value  v(z)  E 
[1  — r,  1-j-r],  where  v  is  uniformely  distributed  and  r  is  the  rate 
of  the  additional  noise.  The  noised  function,  /n(C),  is  given 
by  :  /jv(C’)  =  ^2Z£Fn id£z  C)>t  v(z).  When  rate  r  is  zero,  then 

v(z)  =  I  for  all  z  E  F% y  and  /  =  /jy. 

If  we  find  a  code  C  such  that  f(C)  =  0,  we  start  again  the 
whole  process  with  a  size  decreased  by  one. 
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II.  Noising  for  Covering  Codes 
Let  C  C  Fg  be  &  q- ary  code  of  length  n.  Its  covering  radius 
t(C)  is  t(C)  =  max{d(z1C)  :  z  E  Fg}.  Let  A^(n,<)  be  the 
smallest  size  of  a  q- ary  code  with  length  n  and  covering  radius 
i.  Function  K  has  been  extensively  studied,  in  particular  for 
q— 2  or  3  (see  [3]  for  a  recent  survey  on  covering  radius).  Upper 
bounds  on  K  are  obtained  by  constructions ;  some  of  them  use 
heuristics  based  on  descent,  for  exemple  simulated  annealing. 

In  the  following  we  restrict  ourselves  to  q— 2,  but  there  is 
no  difficulty  in  extending  it  to  any  q. 

The  set  of  solutions  S  is  the  set  of  all  binary  codes  of  gi¬ 
ven  length  n  and  given  size.  The  evaluation  function  /  is  the 
number  of  vectors  in  F?  at  distance  greater  than  t  from  the 
current  solution  C  C  i?  :  /(C)  =  \{z  6  F£  :  d(zfC)  >  /} | . 
The  goal  is  to  have  /(C)  =  0,  proving  that  K^n^t)  <  \C\. 
From  a  random  initial  solution  C,  a  new  solution  Cf  is  obtai¬ 
ned  by  complementing  one  bit  of  one  codeword  (this  defines 
our  neighbourhood). 
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Abstract  —  We  introduce  a  new  error  correcting 
code,  which  we  call  Diamond  code.  Diamond  codes 
combine  the  error  correcting  capabilities  of  product 
codes  and  the  reduced  memory  requirements  from 
CIRC,  the  code  applied  in  the  CD  system. 

I.  Diamond  Code  Construction 

The  Diamond  Code  C  calls  for  two  codes,  Ci  and  C2,  of  equal 
length  n  and  defined  over  the  same  alphabet.  C  consists  of 
the  bi-infinite  strips  of  height  n,  with  each  column  in  Ci  and 
each  diagonal  in  C2. 

A  convenient  way  of  constructing  Diamond  codes  is  by  us¬ 
ing  linear  weakly  cyclic  codes  for  Ci  and  C2. 

Definition:  A  linear  code  B  is  called  weakly  cyclic  if 
(6o,  fa, . . . ,  bn- 2, 0)CB  (0,  &o,  fei,  .  • . ,  6n_2)6B. 
Suppose  both  Ci  and  C2  are  weakly  cyclic  codes,  with  p  and  q 
parity  symbols,  respectively.  The  minimal  span  codewords  in 
C  look  like  (p  +  1)  X  (qr  +  1)  diamonds  as  indicated  in  Figure  1. 
By  the  weakly  cyclic  property  of  Ci  and  C2,  these  elementary 


Figure  1:  Elementary  Diamond  codewords 


diamonds  can  be  positioned  anywhere  within  the  code  array. 
By  taking  suitable  linear  combinations  of  shifted  elementary 
diamonds,  we  can  produce  codewords  that  are  systematic  in 
the  s  =  n  -  (p  +  q)  top  rows.  Moreover,  this  construction 
shows  that  each  information  symbol  in  the  upper  s  rows  can 
affect  the  parities  in  at  most  n  —  p  columns. 

II.  Encoding 

A  Diamond  code  word  whose  columns  contain  only  zeros  for 
negative  time,  can  efficiently  be  encoded  by  alternating  Ci 
and  C2  encodings,  starting  with  the  leftmost  nonzero  Ci  word 
and  ending  with  the  rightmost  nonzero  C2  word.  Such  an 
encoding  can  be  realized  by  the  structure  from  Figure  2.  The 
memory  contents  should  be  set  to  zero  before  the  first  data 
is  fed  into  the  encoder.  The  symbols  immediately  after  the 
Ci  encoder  correspond  to  columns  of  the  Diamond  code  C, 
that  are  written  to  the  channel.  The  feedback  link  in  Figure  2 
makes  it  an  infinite  impulse  response  structure.  The  remark 
at  the  end  of  Section  I,  however,  implies  that  the  structure 
from  Figure  2  has  a  finite  impulse  response  if  Ci  and  C2  both 
are  weakly  cyclic  and  the  encoder  is  initialized  at  the  all  zero 
state. 

HI.  Decoding 

A  decoder  for  C  is  obtained  by  combining  decoders  for  Ci  and 
C2.  In  Figure  3,  we  show  how  a  Diamond  decoder  is  related  to 
the  decoder  configuration  of  CIRC  (Compact  Disc)  [1].  The 


Figure  2:  Systematic  Diamond  encoder 


Diamond  decoder  applies  iterative  decoding  which  is  known  to 
be  very  powerful,  especially  for  correcting  random  errors  and 
short  bursts.  For  an  optimal  performance,  all  symbols  should 
be  checked  by  both  Ci  and  C2.  A  Diamond  code  does  so  (like 
a  product  code),  but  CIRC  does  not.  Like  CIRC,  a  Diamond 
code  allows  for  a  Forney  interleaver  between  consecutive  de¬ 
coding  stages  (the  ” delay”  triangles  in  Fig.  3),  thus  reducing 
the  memory  requirements,  while  retaining  the  distance  and 
decoding  potential  of  the  product  code  of  Ci  and  C2. 


Figure  3:  Decoding  formats 


IV.  Block  variations 

For  data  recording  applications,  there  is  a  need  for  independ¬ 
ent,  randomly  rewritable  data  blocks.  Three  types  of  block 
codes  will  be  discussed  that  share  many  features  with  C,  which 
allows  us  to  share  much  of  their  encoding  and  decoding  hard¬ 
ware  with  the  hardware  for  C.  Each  of  them  offers  a  different 
trade-off  between  rate,  performance  and  similarity  with  the 
parent  code  C  as  a  function  of  the  blocksize. 
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Abstract  —  Performance  loss  caused  by  the  inter¬ 
track  interference  (ITI)  in  recording  channels  may  be 
alleviated  through  the  use  of  multiple-head  systems 
simultaneously  writing  and  reading  a  number  of  ad¬ 
jacent  tracks.  We  consider  a  five-head,  three-track 
system,  and  show  that  there  is  no  loss  in  performance 
of  the  system  due  to  ITI,  under  some  broad  assump¬ 
tions.  We  also  show  that,  under  these  assumptions, 
the  codes  designed  to  provide  certain  coding  gain  in 
single-track,  single-head  systems,  provide  the  same 
coding  gain  in  five-head,  three-track  systems. 

I.  Summary 

We  consider  disk  recording  systems  where  inter-track  interfer¬ 
ence  can  be  described  as  follows:  only  adjacent  tracks  inter¬ 
fere,  and  when  a  reading  head  is  positioned  over  one  of  the 
tracks,  it  responds  to  the  magnetization  of  an  adjacent  track 
as  if  it  were  positioned  over  that  track  but  with  an  amplitude 
modified  by  a  weighting  parameter  a.  Information  is  writ¬ 
ten  in  Nt  adjacent  tracks  and  simultaneously  detected  by  Nh 
reading  heads.  We  analyse  a  discrete-time  model  for  the  mag¬ 
netic  recording  channel  with  input  {  an}>  1  <k<N„  impulse 
response  { hn },  and  output  {?/£}  1  <  k  <  Nh,  given  by 

Vn  =  V/!r]>^(a'a£r1  +a^  +  aa’£1)hn-m  +  »?*, 

m 

where  hn  are  integer,  rfa  are  independent  Gaussian  random 
variables  with  zero  mean  and  variance  <r2,  and  E  is  a  con¬ 
stant  related  to  the  output  voltage  amplitude.  We  refer  to 
E/a 2  as  the  signal-to-noise  ratio  (SNR)  per  track,  and  to 
H(D)  =  J2nhnDn  as  the  channel  transfer  function.  Special 
cases  of  these  systems  with  Nt  =  Nh  =  2  have  been  studied 
by  Barbosa  [1],  Siala  and  Kaleh  [2],  and  Soljanin  and  Georghi¬ 
ades  [3]. 

We  compare  the  performance  of  various  detection  systems 
on  the  basis  of  minimum  Euclidean  distance,  dmin.  This  dis¬ 
tance  determines  the  performance  for  high  values  of  SNR, 
when  the  probability  of  an  error  event  in  the  system  is  closely 
approximated  by  Q(dm inV/SNR). 

Proposition  1  Let  do  be  the  minimum  distance  of  the  com¬ 
posing  single-track  channels.  (For  example,  for  the  H(D)  = 
(1  —  D )  channel,  —  2.)  Then 

j2  /  (1  +  2 a2)dl  if  0  <  a  <  1  -  y/2/2, 

min  \  2(1  +  2a2  —  2a)do  if  1  -  v^/2  <  a  <  1/2, 

as  long  as  do  <6. 

Note  that  there  is  no  performance  loss  due  to  ITI  as  long  as 

^min  ^  ^5,  i.e.,  0  <  a  <  1/2,  which  is  the  entire  interval 

under  consideration.  Note  also  that  the  above  condition  holds 
for  H(D)  -  (1  -  D)(  1  +  D)n,  N  £  {0, 1,  2,  3},  i.e.,  for  the 
most  common  magnetic  recording  channel  transfer  functions. 


Corollary  1  Under  the  assumptions  of  the  preceding  propo¬ 
sition,  a  single-track  code  that  provides  an  increase  in  the 
single-track  minimum  distance  to  d£  =  y/gd0  when  applied  to 
each  track,  results  in  an  increase  in  the  two-track  minimum 
distance  to  d^in  =  y/gdmin,  as  long  as  dcmin  <  y/e . 

Note  that  the  above  holds  for  a  dc-free  coded  1  —  D  channel 
as  well  as  for  a  Nyquist-free  coded  (1  -  D)(  1  +  D)2  channel. 

Performance  of  five  different  detection  systems  are  com¬ 
pared  Fig.  1,  which  plots  d^in  for  each  of  the  five  cases  as  a 
function  of  the  interference  parameter  a . 


Figure  1:  Performance  of  five  different  detection  systems 

for  channel  of  three  interfering  tracks. 
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Abstract  —  In  this  paper  we  consider  designing 
multi-track  codes  for  parallel  networks  of  partial- 
response  channels.  The  codes  we  design  have  the  ca¬ 
pability  of  combating  inter-track  interference  and  also 
providing  additional  gains. 


matrix  corresponding  to  (1  —  D )  channel  and  define  an  inner 
product  in  £n+1  as  <  u,  v  >A~  uAvt  with  u,  v  6  J?n+1  For 
the  network  of  two  parallel  (1  -  D)  channels  with  symmetric 
ITI  coefficient  a  the  distance  between  two  allowable  recorded 
sequences  is  given  by  [3], 


In  single-track  magnetic  recording  systems  data  is  coded, 
modulated  and  written  on  tracks  independently  from  neigh¬ 
boring  or  any  other  tracks,  with  the  same  modulation  code 
used  on  each  track.  The  main  idea  of  a  multi-track  system 
is  to  encode,  write  and  read  in  parallel.  Decreasing  the  k 
constraint  is  always  desirable  in  recording  systems,  because 
this  brings  better  synchronization,  consequently  higher  access 
speeds.  However,  the  price  of  a  low  k  constraint  is  the  lower 
code-rate,  in  other  words  the  capacity  loss.  The  advantage 
of  using  multi-track  systems  is  the  altered  form  of  the  k  con¬ 
straint.  In  a  multi- track  recording  system,  the  k  constraint 
on  each  channel  is  removed  and  a  joint  (vector)  k  constraint  is 
imposed  on  the  channel  output  sequences.  As  pointed  out  in 
[l],  this  provides  higher  rates  and  the  information  for  timing 
and  gain  control  can  be  obtained  from  any  of  the  tracks  which 
are  coded  jointly. 

The  multi-track  system  under  investigation  is  modeled  as  a 
set  of  parallel  partial- response  channels.  Each  channel  is  of 
the  form  (1  -  D)(  1  +  D)n .  In  addition  to  this,  the  effect  of 
the  interference  from  each  track  to  others  is  formulated  by  an 
inter-track  interference  (ITI)  matrix,  T.  Matched  Spectral- 
Null  (MSN)  modulation  codes  which  are  intended  for  use  on 
noisy  partial-response  channels  with  a  finite  input  alphabet 
size  were  described  in  [2].  These  codes  provide  significant  in¬ 
crease  in  the  minimum  Euclidean  distance,  limit  the  maximum 
run  length  of  identical  samples  and  are  designed  to  eliminate 
quasi-catastrophic  sequences.  The  simplest  approach  to  de¬ 
signing  MSN  codes  for  multi-track  systems  is  to  code  each 
track  independently.  However,  independent  coding  of  each 
track  for  a  multi-track  system  is  undesirable  for  two  reasons. 
First,  it  does  not  have  the  effect  of  decreasing  the  k  constraint. 
Second,  such  a  scheme  ignores  the  existence  of  ITI  which  is 
inherent  in  all  multi-track  systems  under  investigation  and  is 
not  expected  to  provide  adequate  coding  gains  except  in  very 
special  forms  of  inter-track  interference. 

In  this  paper  we  consider  designing  multi-track  codes  for 
parallel  networks  of  (1  -  D )  channels.  The  codes  we  de¬ 
sign  have  the  capability  of  combating  ITI.  Formally,  let  F 
be  the  graph  representing  the  cross  product  of  two  canoni¬ 
cal  diagrams  of  single-track  MSN  codes  with  edge  labels  from 
{0,  l}2.  Let  a  =  a0, . . . ,  an  and  b  =  60, .  •  • ,  bn  be  sequences 
generated  by  paths  in  F,  To  =  {(To ,  >  <r2 ,  •  •  •  ?  °n+ i  }  and 

Tb  =  {a0,(Ti5a2b,...,(T^,cr^+i}.  The  sequence  e  = 
with  et  =  (an  -  bn ,  a»2  ~  6*2 )  is  referred  as  the  difference  se¬ 
quence  corresponding  to  a  and  b  .  If  =  <rn+\  then  c 

is  called  a  difference  event,  and  if  <r£+i  =  <rn+i  =  (To ,  then 
c  is  called  a  difference  cycle.  The  difference  sequences  corre¬ 
sponding  to  a  and  b  on  track  1  and  track  2  are  denoted  as  ei 
and  e2,  respectively.  Let  A(n+i)X(n+i)  be  the  autocorrelation 


d2(e)  =  (1  +  a2)(||  ei  ||A  4-  ||  e2  ||A)  +  2.2a  <  ei,e2  >A 


To  design  two-track  codes  which  are  able  to  combat  the 
performance  loss  caused  by  ITI,  we  consider  subgraphs  of  F 
with  the  property  that  for  any  difference  cycle  contained  in 
this  subgraph  the  difference  cycles  corresponding  to  track  1 
and  track  2  satisfies,  ei  ^  — e2.  We  denote  such  a  subgraph 
of  F  with  H.  Then  we  have  the  following  proposition  which 
provides  improvements  over  the  results  of  [3], 

Proposition  :  Let  a  =  and  =  &o,...,6n 

be  two  distinct  sequences  generated  by  paths  in  H,  Ta  = 
{cr0 ,  of ,  <7% , . . . ,  (To }  and  Tb  =  {(To,  <j\,  . . . ,  ,  0o},  with 
do  ^bo. 


min  d2(c) 

^0 


(  2(1  + a2)  if  0  <  a  < 

\  4(1  -  a)2  +  2a  if  <  <*  <  I 


The  capacity  of  H  we  considered  was  0.7997.  We  designed 
codes  with  rates  up  to  capacity  and  confirmed  the  gains  pre¬ 
dicted  by  the  proposition  with  simulations. 
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Abstract  —  A  new  family  of  MDS  array  codes  is 
presented.  The  code  arrays  contain  p  information 
columns  and  r  independent  parity  columns,  where  p  is 
a  prime.  We  give  necessary  and  sufficient  conditions 
for  our  codes  to  be  MDS,  and  then  prove  that  if  p  be¬ 
longs  to  a  certain  class  of  primes  these  conditions  are 
satisfied  up  to  r  <  8.  We  also  develop  efficient  decod¬ 
ing  procedures  for  the  case  of  two  and  three  column 
errors,  and  any  number  of  column  erasures.  Finally, 
we  present  upper  and  lower  bounds  on  the  average 
number  of  parity  bits  which  have  to  be  updated  in  an 
MDS  code  over  GF(2m),  following  an  update  in  a  sin¬ 
gle  information  bit.  We  show  that  the  upper  bound 
obtained  from  our  codes  is  close  to  the  lower  bound 
and  does  not  depend  on  the  size  of  the  code  symbols. 

I.  Introduction 

This  work  is  concerned  with  maximum  distance  separable 
(MDS)  codes.  The  Reed-Solomon  (RS)  codes  are  a  well-known 
example  of  MDS  codes.  However,  with  Reed-Solomon  codes, 
(a)  the  encoding  and  decoding  procedures  are  performed  as 
operations  over  a  finite  field,  and  (b)  an  update  in  a  single  in¬ 
formation  bit  requires  an  update  in  all  the  parity  symbols  and 
affects  a  number  of  bits  in  each  symbol.  These  two  properties 
of  RS  codes  are  quite  undesirable  for  certain  channels.  Firstly, 
the  fact  that  encoding/decoding  is  performed  in  a  finite  field 
makes  it  unfeasible  to  use  large  symbols,  since  the  size  of  the 
field  grows  exponentially  with  the  symbol  size.  Secondly,  the 
fact  that  an  update  in  a  single  information  bit  requires  to 
re-compute  most  of  the  parity  bits  is  particularly  undesirable 
in  storage  applications  where  the  stored  data  has  to  be  fre¬ 
quently  updated  in  real-time.  In  this  work,  we  present  a  new 
family  of  MDS  codes  having  the  following  two  properties:  en¬ 
coding  and  decoding  may  be  accomplished  with  simple  cyclic 
shifts  and  XOR  operations  on  the  code  symbols,  without  finite 
field  operations;  and  an  update  in  an  information  bit  affects 
a  minimal  number  of  parity  bits. 

II.  The  New  MDS  Array  Codes 

Our  new  codes  are  based  on  recent  work  in  array  codes  [l, 
3].  We  assume  that  the  information  is  presented  as  a  two- 
dimensional  array  of  bits.  Henceforth  we  will  identify  the 
symbols  of  an  MDS  code  with  the  columns  of  such  an  array. 
Thus  the  errors  that  can  occur  are  column  errors. 

A  trivial  example  of  an  MDS  array  code  of  this  type  is  a 
simple  parity  code.  This  code  is  defined  by  requiring  that 
the  last  column  in  the  array  is  a  parity  column,  given  by  the 
exclusive- OR  of  the  other  columns.  The  first  nontrivial  gener¬ 
alization  of  the  parity  code  is  the  Evenodd  code  introduced 
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in  [1],  The  Evenodd  code  has  columns  of  size  p—  1  for  some 
prime  p,  and  requires  two  parity  symbols.  It  can  correct  one 
error  or  two  erasures. 

In  this  paper,  we  generalize  the  construction  of  the  Even- 
ODD  code  to  a  family  of  codes  with  p  information  columns  and 
r  parity  columns,  for  r  >  1.  We  assume  that  p  is  prime  num¬ 
ber,  and  let  Mp(x)  =  1  -f  x  -|- - 1-  xp~l  with  Mp(x)  6 

Consider  the  code  C  whose  entries  are  in  the  ring  of  polyno¬ 
mials  modulo  Mp (a:),  defined  by  the  parity-check  matrix: 


H 


(  1  1 
I  1  a 


1  1  0  ...  0  \ 

av~l  0  1  ...  0 


\  1  ar~l 


a(r-l)(p— 1)  Q  Q  ^  x  J 


It  is  not  difficult  to  show  that  this  code  is  MDS  for  all  p  when 
r  =  2  or  r  =  3.  However,  this  is  no  longer  true  when  r  >  4.  We 
give  necessary  and  sufficient  conditions  for  the  code  to  be  MDS 
when  r  >  4.  Although  we  determined  completely  the  primes 
P  <  100  for  which  this  code  is  MDS  when  r  <  8,  checking  the 
necessary  and  sufficient  conditions  in  the  general  case  may 
be  very  complex.  Our  solution  to  this  problem  is  related  to 
certain  generalizations  of  Vandermonde  determinants  called 
alternants.  Using  alternants,  we  have  been  able  to  show  that 
if  2  is  primitive  in  Fp,  then  our  codes  are  MDS  up  to  r  =  5  for 
all  P  3,  and  up  to  r  =  8  for  all  p  g  {3,  5, 11, 13, 19,  29}. 


III.  Decoding  and  Information  Updates 

We  present  a  decoding  algorithm  for  the  case  of  two  symbol 
errors,  that  is  for  r  =  4.  Notably,  this  algorithm  does  not 
require  finite  field  operations.  This  extends  the  algorithms 
of  [3],  applicable  only  for  the  case  of  a  single  symbol  error. 

Finally,  we  present  lower  and  upper  bounds  on  the  average 
number  rj(C)  of  parity  bits  affected  by  an  update  in  a  single 
information  bit.  In  particular,  we  investigate  the  behavior  of 
r](C)  for  MDS  codes  over  GF(2m).  It  is  shown  that  for  our 
codes  rj(C)  does  not  depend  on  the  size  of  the  code  symbols. 
In  contrast,  we  also  show  that  for  Reed-Solomon  codes,  as  well 
as  for  the  MDS  codes  of  Blaum  and  Roth  [3],  r](C)  increases 
linearly  with  the  symbol  size. 

All  these  properties  of  the  new  MDS  array  codes  make  them 
very  well  suited  for  applications  where  the  size  of  the  code 
symbols  is  required  to  be  large.  We  refer  the  reader  to  [2]  for 
further  details. 
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Abstract  —  We  give  a  rigorous  proof  of  Etzion  and 
Wei's  conjecture  about  the  capacity  rate  of  two- 
dimensional  Runlength  Limited  Codes. We  also  provide 
an  alternative  approach  to  compute  the  capacity  rate. 


SUMMARY 

Runlength-limited(RLL)  codes  are  binary  codes  with 
the  maximum  and  minimum  runlength  constraints  of 
its  codewords.  Etzion  and  Wei[l]  have  studied  the 
extension  of  runlength-  limited  codes  to  two- 
dimensions.  One  of  the  most  fundamental  problem  is 
to  determine  the  capacity  rate  of  2-D  RLL  codes, 
i.e.,  the  highest  code  rate  possible  under  a  given 
set  of  runlength  constraints. 

A  2-D  nxX  n2  (dx,kx,d2,k2;d3,k3,d4,k4)  array  is 
an  nxXn2  binary  array  with  the  following  parameters: 

(1)  dx(d2)  is  the  shortest  run  of  ZEROs  (ONEs) 

horizontally, 

(2)  kx(k2)  is  the  longest  run  of  ZEROs  (ONEs) 

horizontally, 

(3)  d3(d4)  is  the  shortest  run  of  ZEROs  (ONEs) 

vertically, 

(4)  k3(k4)  is  the  longest  run  of  ZEROs  (ONEs) 

vertically, 

If  the  horizontal  constraints  are  the  same  as  the 
vertial  constraint,  then  it  is  a  (dx,kx,d2,k2)  array. 

The  capacity  rate  of  2-D  (dx,kx,d2,k2)  arrays  is 
defined  as: 

log  F(nx,n2) 

C  =  lim - 

n-^oo  nxXn2 

where  F(nx,n2)  is  the  number  of  valid  n  xX  n  2 
-configurations  with  runlength  constraints. 

To  determine  the  capacity  rate  of  2-  D  RLL  code, 
we  assign  a  Gibbs  measure  associated  with  the 
given  2-D  runlength  constraints  as  follows.  Let 

A 

A  <**>={n=(nx,n2);0<  j  nx  |  ,  n2  j  <n}  =  A 
be  the  finite  box  of  Z2’.  We  define  an  energy 
function  for  configurations  on  the  finite  set  A  by 
U(X-  )  =  E  Vc(^) 

C  A 


where 

Vc(* 


o)=  f  J  if  C  is 
V  0  otherwi 


a  minimal  violating  subconfiguration 


otherwise 
Then  define  an  Gibbs  field  by: 

Pa  (xv  )=Z^~:lexp{-UA  (xA  )}. 


(1) 


where  Z^  is  the  normalization  factor  called 
partition  function. 

Theorem  1.  C  =  lim  hj  ,  where 

J — ►oo 

1 

hj  =  lim  -  Hj.,  (X-  *“>) 

n-^oo  |  a  | 

is  the  entropy  rate  of  the  Gibbs  field  (1) 
associated  with  the  2-D  RLL  constraints  for  fixed  J, 
and 

HJtA  (X-  <->)  =  -S  PA  (*A  )logP.  (x*  ) 
xA 

This  result  was  first  conjectured  by  Etzion  and 
Wei  [1].  We  give  a  rigorous  proof. 

The  determination  of  the  entropy  rate  of  a  Gibbs 
measure  is  a  notoriously  difficult  problem.  We  find 
an  alternative  formula  for  the  capacity  rate  of  2-D 
RLL  codes  which  can  be  used  to  approximate  the 
capacity  rate.  Let  us  describe  the  idea  by  an  simple 
example.  For  2— D  (l,oo,l,l)  RLL  code  we  define  a 

sequence  of  matrices  An  as  follows: 

First  find  all  nXl  column  vectors  which  satisfy 
the  column  constraints  and  label  them  by  xx,x2,...,  Xf(h>, 
where  F(n)  is  the  number  of  n- vectors  which  satisfy 
the  constraints.  We  assign  an  F(n)  X  F(n)  merging 
indicator  matrix  An=(Ai>j),  where 

I  1  if  merging  x£  and  xd  results 
Ai*j=Y  a  valid  nX2  array 
V  0  otherwise 


We  define  another  sequence  of  matrices  Bn's  by 

B-C) 

Then  we  have  the  recursive  form: 


where  is  the  transpose  of  B„.  Denote  by  u  n  the 
largest  eigenvalue  of  the  matrix  An,  then  we  have 
the  following  theorem. 

Theorem  2.  For  2-D  (l,oo,l,l)  code,  the  capacity  rate 
1 

C  =  lim  —  log  u 
n-*oo  n 

This  method  can  be  extented  to  other  2-D  RLL  codes. 


Supported  in  part  by  Chinese  NNSF  and  U.S.  NSF  Grant 
NCR-9205265 

^Supported  in  part  by  U.S. NSF  Grant  NCR-9205265 
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Abstract  —  A  model  representing  the  physical  laws 
which  govern  magnetic  recording  media  is  presented. 
Previous  results  in  the  analysis  of  Hopfield  neural  net¬ 
works  may  be  extended  and  applied  to  this  model  to 
determine  its  storage  capacity  limits. 


I.  Introduction 

The  central  component  of  any  magnetic  recording  system  is 
the  medium  on  which  information  is  stored  in  magnetic  pat¬ 
terns.  The  storage  capacity  of  a  recording  medium  cannot 
exceed  the  logarithm  of  the  number  of  distinct  magnetic  pat¬ 
terns  which  can  be  sustained  over  time.  These  limits  are  fun¬ 
damental  to  the  physical  nature  of  a  given  recording  medium, 
irrespective  of  the  devices  or  methods  used  for  recording. 


II.  Medium  Model 

Let  a  medium  be  represented  by  a  planar  array  of  N  square 
tiles,  indexed  by  i  6  I.  Let  0t  be  the  orientation  of  the  2  th 
tile’s  easy  axis  of  anisotropy.  The  0t  are  random  variables 
whose  values  are  fixed  in  the  manufacture  of  the  medium,  in¬ 
dependently  drawn  from  a  uniform  distribution  on  the  interval 
[— 7r,  x].  The  magnetization  of  tile  i  is  m ;  =  Si  [cos  0t,  sin  0t]T, 
where  Si  6  {±1}. 

The  normalized,  effective  magnetic  field  at  tile  i  is 

hi  =  Ke  nij 

jeN(i) 


+  V  _ _ 

' f  (x?  -j-  z2  )5/2 

The  constants  Ke  and  Km  scale  the  relative  strength  of  the  ex¬ 
change  interaction,  arising  from  neighbor  tiles  [7V(*)],  and  the 
magnetostatic  interaction,  arising  from  all  tiles.  ht  is  resolved 
into  components  parallel  and  perpendicular  to  the  easy  axis  of 
tile  z,  /i||)t-  and  h±)t,  and  the  magnetic  state  evolves  according 
to  a  modified  Stoner- Wohlfarth  model[l]  update  rule, 
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3zi 


3iC  %  J  %IJ 

2z2  -  x2 


(1) 


new 

S{ 


ht!  a2/3 


...  +*^<1, 

sgn(A||,<)  h\[f  +  h2/2  >  1 


(2) 


The  update  is  repeated  at  randomly  selected  tiles  until  a  state 
is  reached  which  undergoes  no  further  changes. 


III.  Capacity  Analysis 

Only  fixed  points  of  the  update  rule  are  suitable  for  infor¬ 
mation  storage.  When  there  are  Fn  fixed  points,  the  storage 
density  is  limited  by  C  =  jj  log2  Fn,  expressed  in  units  of 
bits  per  tile.  C  is  a  function  of  Ke ,  Km,  and  the  9t.  Fig¬ 
ure  1  displays  the  capacity  limit  C  computed  via  simulation 
for  a  medium  of  16  tiles  arranged  in  a  4  x  4  array,  using  one 
realization  of  orientation  angles,  0i  [2]. 

For  small  values  of  Ke  and  Kmj  all  magnetic  states  are 
stable,  and  C  =  1.  As  Ke  and  Km  increase,  the  second  line 
of  the  update  rule  has  an  effect.  Let 

A,  =  {hi  :  h\[f  +  h2/2  <  1  or  hhi  >  0, 1  <i<  N}.  (3) 

1This  work  was  supported  in  part  by  NSF  Grant  NCR-94-06197. 
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Figure  1:  Capacity  limit  as  a  function  of  I<e  and  Km 

Application  of  DeMorgan’s  Law,  the  Union  Bound,  and  sym¬ 
metry  properties  of  the  model  yields 

G  >  1  +  ^rlog2(l  -  iV^max^Prfhi  £  Ai}).  (4) 

For  large  N ,  edge  effects  may  be  ignored  and  the  functional 
dependence  of  Pr{hi  g-  Ai}  on  Ke  and  Km  has  been  estimated 
numerically.  The  results  determine  that  the  analytical  bound¬ 
ary  of  the  region  of  the  Ke-Km  plane  for  which  all  states  are 
stable  is  consistent  with  simulation  results. 

For  large  values  of  Iie  and  Km,  the  role  of  hL)i  becomes 
insignificant,  and  the  model  may  be  simplified  to 

=  X]  Wiisi’  =  sgn(A||,<)>  (5) 

with  appropriate  definitions  of  the  Wij.  This  case  corresponds 
to  the  Hopfield  model  with  random  weights. 

When  the  wi3  are  independent  standard  normal  random 
variables,  for  large  N ,  the  number  of  fixed  points  is  [3] 

Fn  «  1.0505  •  20  2874isr.  (6) 

Thus,  a  storage  capacity  limit  of  about  0.29  bits  per  tile  would 
prevail  if  independent,  identically  distributed,  zero-mean 
Gaussian  random  weights  accurately  reflected  the  medium 
model.  For  a  typical  tile  size,  the  corresponding  areal  stor¬ 
age  density  limit  is  about  116  Gbits  per  square  inch.  Such 
analysis  must  be  extended  to  determine  the  capacity  limits 
when  Wij  are  given  by  the  medium  model. 
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I.  Introduction 

We  consider  the  problem  of  determining  the  maximum  en¬ 
tropy  of  a  discrete  random  field  on  a  lattice  subject  to  certain 
local  constraints  on  symbol  configurations.  The  results  are  ex¬ 
pected  to  be  of  interest  in  the  analysis  of  digitized  images  and 
two  dimensional  codes.  We  shall  present  some  examples  of  bi¬ 
nary  and  ternary  fields  with  simple  constraints.  Exact  results 
on  the  entropies  are  known  only  in  a  few  cases,  but  we  shall 
present  close  bounds  and  estimates  that  are  computationally 
efficient. 

II.  Fields  with  Simple  Constraints 

We  consider  random  variables  on  a  rectangular  grid,  x(i,j). 
The  lattice  is  defined  by  the  set  of  neighbors  associated  with  a 
given  point.  We  shall  not  assume  that  the  probability  distri¬ 
bution  is  given,  but  the  structure  of  the  field  will  be  specified 
in  terms  of  a  set  of  constraints  on  the  values  assumed  by  a 
particular  variable  and  its  neighbors.  Constraints  could  be  of 
one  of  the  following  (not  necessarily  distinct)  types: 

-  the  runs  of  pixels  of  a  given  color  should  satisfy  a  set  of 
inequalities  [l] 

-  the  field  is  a  random  tiling  of  the  plane  with  certain  pieces 

[2] 

-  certain  configurations  of  values  are  excluded 

Since  we  are  interested  in  estimates  of  the  entropy  which 
may  be  related  to  coding  and  data  compression,  we  consider 
fields  which  are  obviously  stationary.  The  existence  of  solu¬ 
tions  to  the  constraints  should  not  be  a  problem,  and  bound¬ 
ary  conditions  should  not  be  important. 

Example  1:  As  a  simple  example  we  shall  consider  the  fol¬ 
lowing  problem  which  is  quite  well-known:  Consider  a  binary 
field  on  a  rectangular  lattice  with  the  restriction  that  two 
neighbors,  i.e.  x(i,j)  and  x(i,j  +  1)  or  x(i,j)  and  x(i  +  l,j), 
cannot  both  have  the  value  1.  What  is  the  largest  possible 
entropy,  or  what  is  the  number  of  solutions  for  an  N  by  N 
segment  of  the  lattice  as  a  function  of  N't  We  estimate  the 
entropy  to  be  H  «  0.587891161775339. 

III.  Markov  Chains 

As  suggested  in  [1],  the  maximal  entropy  may  be  bounded 
by  the  entropy  of  a  band  of  finite  width,  i.e.  the  variable  j  is 
restricted  to  0  <  j  <  m.  This  entropy  can  be  calculated  as 
the  maximal  entropy  of  a  finite  state  Markov  chain,  and  from 
this  approach  we  obtain  an  upper  bound  (with  a  suitable  re¬ 
laxation  on  the  restrictions  at  the  boundaries).  This  estimate 
converges  slowly.  For  the  problem  of  Example  1  we  get  H  < 
0.5928  for  m=20  imposing  no  restrictions  at  the  boundaries. 
Constraining  the  probability  of  a  1  at  the  boundaries  a  tighter 
bound  may  be  obtained.  In  some  cases  it  is  possible  to  derive  a 
very  accurate  estimate  from  this  sequence  of  values,  The 
estimate  given  in  Example  1  was  obtained  as  Hm+ 1  —  Hm  with 
m— 16. 

Another  type  of  estimate  may  be  obtained  from  finite  state 
causal  models  of  the  field.  If  the  outcome  of  the  process  is 


generated  one  pixel  at  a  time,  and  the  probability  distribution 
of  a?(*/,  j_/)  is  assumed  to  depend  on  a  finite  past  context  i  <  i 
or  i  =  i*  and  j  <  j\  then  the  entropy  can  be  approximated 
by  that  of  a  finite  Markov  source.  This  approach  gives  some 
information  about  the  properties  of  the  field,  but  the  model 
is  only  exact  in  a  very  simple  case,  which  is  discussed  in  the 
following  section. 

IV.  Construction  of  Stationary  Fields 

An  actual  construction  of  a  random  field  with  known  en¬ 
tropy  is  interesting  both  for  simulation  purposes  and  as  a 
method  for  establishing  lower  bounds.  It  would  be  very  de¬ 
sirable  to  have  random  fields  where  rcws  and  columns  were 
described  by  simple  Markov  chains.  Unfortunately  this  ap¬ 
pears  to  be  possible  only  in  the  case  of  the  Pickard  lattices 
[3].  This  is  also  the  only  case  where  the  causal  model  of  the 
field  becomes  a  simple  finite  state  source. 

Example  2:  A  Pickard  field  consistent  with  the  constraint 
considered  in  Example  1  may  be  constructed  such  that  each 
row  or  column  is  a  Markov  chain  with  jP(1)=1/5.  In  this  case 
the  entropy  may  be  found  explicitly  as  H  =  1/10  +  3/10  log3 
=  0.575...  Actually  a  slightly  larger  value  may  be  obtained  by 
varying  the  transition  probabilities. 

Clearly  the  solution  to  the  maximum  entropy  problem  is 
always  a  Markov  random  field.  However,  in  general  such  fields 
are  hard  to  analyze.  We  shall  consider  a  construction  which 
has  much  greater  flexibility  than  the  Pickard  field,  but  still 
allows  detailed  analysis: 

Let  rows  i  and  i  -f  1  be  generated  by  a  Markov  chain  (or 
another  unifilar  finite  state  source),  such  that  there  is  com¬ 
plete  symmetry  between  the  two  rows.  Their  joint  entropy 
can  be  easily  calculated.  The  probability  distribution  may  be 
extended  to  a  stationary  distribution  on  the  entire  plane  by 
assuming  that  all  pairs  of  rows  have  the  same  distribution, 
and  that  the  probability  of  each  row  given  the  past  depends 
on  only  the  previous  row.  The  entropy  of  a  row  given  the 
previous  row  may,  in  some  cases,  be  calculated  from  a  hidden 
Markov  source. 

Example  3:  For  the  constraint  in  Example  1,  two  succes¬ 
sive  rows  may  be  generated  by  a  symmetric  3-state  Markov 
chain  with  entropy  H2 .  From  this  source  it  is  possible  to  cal¬ 
culate  the  entropy  of  a  single  row  (2),  H 1,  exactly.  Whenever 
x(i,j)  =  1,  the  state  of  the  source  is  known,  and  the  distribu¬ 
tion  of  zero  runs  can  be  calculated.  We  find  the  entropy  of  the 
process  as  H  =  H2  -  Hi .  The  largest  lower  bound  obtained 
in  this  way  is  0.58783. 
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Abstract  —  The  shift-  and  scale-orthogonality  prop¬ 
erties  of  wavelets  and  scaling  functions  provide  a 
means  of  producing  signal  spaces  of  arbitrary  dimen¬ 
sionality.  The  signal  spaces  are  presented  with  an 
example  trellis  code  on  the  Be  lattice. 

I.  Introduction 

Let  =  2^2ip(2jt  —  k)  and  <j>jk(t)  =  23^2(j>(2 H  —  k)  rep¬ 

resent  scaled  and  shifted  wavelet  and  scaling  functions,  re¬ 
spectively,  where  we  take  j,k  £  Z .  Normalized  orthogonal 
wavelets  and  scaling  functions  have  the  following  orthogonal¬ 
ity  properties: 

<  iftjkirflm  >  —  bjtlbk,m 

<^jk,<f>im>  =  0  Vj,  &,/,  m 

<  <f>jkj<t>jm  >  =  bk,m> 

In  addition,  these  functions  have  attractive  frequency  local¬ 
ization:  <j)(t)  is  a  low-pass  function  and  is  a  band-pass 
function.  A  family  of  wavelets  of  interest  is  the  compactly 
supported  orthogonal  wavelets  introduced  by  I.  Daubechies. 
Members  of  these  families  are  denoted  by  Djv,  N  even,  where 
N  is  the  number  of  coefficients  in  the  two-scale  implicit  de¬ 
scription  of  the  scaling  function  <f>(t)  =  ^2n~o  c^(2t  -  n), 
which  has  support  over  [0,  N  -  1).  The  regularity  (and  hence 
the  frequency  localization)  of  member  of  Dn  increases  with 
N. 

Considerable  attention  has  been  focused  on  wavelet  and 
scaling  functions  over  the  last  few  years,  due  to  their 
time/frequency  localization  ability  and  the  existence  of  fast 
transform  algorithms.  In  this  paper,  we  introduce  the  applica¬ 
tion  of  wavelets  and  scaling  functions  as  baseband  waveforms 
for  the  transmission  of  digital  signals. 

Using  a  basic  bit  time  normalized  to  unity,  a  baseband 
signal  may  be  written  as 

s(t)  =  'y  ^  —  n) 

n 

where  {In}  represents  an  alphabet  of  symbols  drawn  from  a 
(possibly  complex)  signal  constellation  C.  For  £  DN  for 
N  >  4,  this  signalling  scheme  has  better  spectral  localization 
than  MSK. 

Signalling  with  several  scales  of  wavelet  functions  may  be 
written  as 

5W  =  y  ^  y  ^  Is,n^s,n(t)  +  y  ^  Jn<l>(T,n(t) 
n  s£S  n 

where  S  is  a  set  of  scale  indices,  a  is  a  single  fixed  scale, 
and  Jn  is  drawn  from  a  constellation  which  may  be  empty 
(if  the  scaling  function  is  not  used  for  transmission).  Table 
I  indicates  the  dimensions  for  transmission  with  various  sets 
of  scales,  where  the  scale  notation  is  as  follows:  If  a  scaling 
function  is  used  on  a  given  scale,  it  is  listed  before  the  wavelet 
function  for  that  scale.  Absence  of  a  wavelet  on  a  particular 
scale  is  indicated  by  — . 


Table  1:  Dimensionalities  obtainable  using  multiple 
scales. 


Num.  of 

Function 

Num. 

Levels 

Listing 

Dim. 

2 

iftip 

3 

2 

3 

2 

5 

2 

ib<b— 

3 

2 

4 

3 

'ijj'lpll) 

7 

3 

11 

3 

(f)  —  'ifc'tf) 

7 

3 

9 

3 

xjj(f)  — 

7 

3 

8 

3 

Ip'l/jcj)  — 

7 

II.  Trellis  coding  in  wavelet  signal  spaces 

Multiple-scale  signalling  provides  a  spectrally  efficient  way  of 
producing  arbitrary  dimensionalities.  The  dimensions  for  cod¬ 
ing  available  using  multiple-scale  signal  provide  rationale  for 
exploring  trellis  codes  in  dimensions  other  than  those  usually 
explored.  Due  to  limited  space,  we  present  only  one  example. 

The  code  is  based  upon  the  construction  of  Calderbank  and 
Sloane  over  the  Eg  lattice.  This  is  a  lattice  A  with  generator 
M  over  the  Eisenstein  integers  and  endomorphism  0 


'  0 

0 

0  ■ 

r  (1.2) 

0 

0  * 

0 

0 

0 

0=  0 

(L2) 

0 

.  1 

1 

1 

0 

0 

(1,2)  . 

where  (a,  6),  a,  b  £  Z  represents  the  Eisenstein  integer  a  +  bus, 
w  =  (l  +  2)/2.  The  sublattice  A'  generated  by  M '  =  M0 

produces  a  quotient  group  A/A'  with  64  cosets. 

A  generator  for  a  four-state  convolutional  coder  is 

"  0  0  0  0  0  0  1  " 

0  0  0  0  1  0  0 

0  0  1  0  0  0  0 

9  -  0000010 
0  10  10  0  0 
1  0  0  0  0  0  0 

This  gives  coding  gains  (for  example)  of  2.43  dB  when  fc2  =  0, 
2.4  dB  when  &2  =  1,  and  3.4  dB  when  &2  :=  2.  A  computer 
search  for  better  codes,  both  in  Eg  and  its  dual  E$,  is  under¬ 
way. 

III.  Extension 

There  is  fertile  ground  for  trellis  code  developments  in  other 
dimensions  and  for  application  of  multi-scale  signalling  to  fad¬ 
ing  channels. 
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Abstract  —  We  study  nonparametric  estimates  of 
E[Yn\Xn]  of  the  form  wm(Xu. . . ,  Xn)Yt  based  on 

Xn  and  data  {(X;,  Y;)}^1-  Our  work  analyzes  the  case 
where  { A; }  is  a  completely  arbitrary  random  process. 
Conditions  on  the  weights  are  established  so  that  the 
time-average  of  the  estimation  errors  converges  to 
zero.  One  consequence  of  our  work  is  a  recovery  and 
extension  of  some  classical  results  to  stationary  pro¬ 
cesses  in  separable  metric  spaces. 

I.  Preliminaries 

Let  Xi  ,X2i...,Xn  be  an  arbitrary  random  process  taking 
values  in  a  compact  subset  of  a  separable  metric  space  (X,  p). 
Special  cases  include  nonstationary  or  nonergodic  processes 
and  deterministic  sequences.  Each  A;  —  Xi  has  an  as¬ 
sociated  label  Yi  which  is  a  random  variable,  taking  val¬ 
ues  in  R,  drawn  from  an  unknown  conditional  distribution 
F(y\Xi  =  Xi).  A  classical  problem  is  to  nonpar  ametri- 
cally  estimate  the  regression  function  m(An)  =  E[Y^|Xn] 
given  Xn  and  additional  data  pairs  {(A;,  ItJKLj1  ■  Follow¬ 
ing  the  seminal  work  in  [6],  we  consider  estimators jyf  the 
form  mn(An)  =  FFni(Ai, . . . ,  Xn)Yi  where  Wn  = 

{Wni(Xi, . . .  ,  Xn)}?~i  is  a  sequence  of  probability  weights 
that  satisfy  certain  conditions. 

Most  previous  work  considered  the  case  in  which  {(A;,  Yi)} 
are  i.i.d.  Stone  [6]  proved  that  certain  conditions  on  the 
weights  guarantee  consistency  without  making  any  assump¬ 
tions  on  the  distributions  other  than  that  {(A;,  Yi)}  are  i.i.d. 
and  that  EY2  <  oo.  Universal  consistency  of  specific  estima¬ 
tors  was  discussed  in,  e.g.,  [2,  4].  In  this  paper,  we  impose 
no  restrictions  on  the  random  process  except  that  {At}  take 
values  in  a  compact  subset  of  a  separable  metric  space  and 
that  the  following  assumption  holds: 

(AO)  For  each  i  and  for  every  measureable  set  5, 

Pr(Yi  e  SlAi,...,  Aniyi,...,K-i)  =  Pr(Yi  <E  S\Xi) 

This  is  a  very  general  setting  and  as  a  result,  in  general,  one 
cannot  expect  to  design  a  consistent  estimator.  However,  we 
show  that  when  restricted  to  continuous  regression  functions, 
we  can  prove  that  conditions  on  the  weights,  analogous  to 
those  in  [6],  ensure  time-average  consistency,  i.e.,  the  time- 
averaged  estimation  errors  go  to  zero  for  every  random  pro¬ 
cess.  Our  proof  techniques  are  elementary  sample  path  anal¬ 
yses  and  are  in  line  with  current  trends  in  information  theory 
and  statistics  to  evaluate  performance  in  terms  of  individual 
data  sequences.  See,  for  example,  [1,  3,  5]. 

We  impose  the  following  assumptions  on  F(y\x): 

(Al)  E[Y2\X  =  z]  <  oo, 

(A2)  m(x)  —  E[Y\X  =  x]  is  a  continuous  function. 

1This  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  grants  IRI-9209577  and  IRI-9457645  and  by  the  U.S. 
Army  Research  Office  under  grant  DAAL03-92-G-0320. 


II.  Consistency 

Our  result  is  an  extension  of  [6]  to  separable  metric  spaces 
and  to  arbitrary  random  processes.  To  achieve  the  latter,  we 
can  only  conclude  that  the  time- averaged  squared  loss  goes  to 
zero.  Accordingly,  it  is  natural  that  our  statements  on  the 
weights  are  time-averaged  versions  of  those  in  [6]. 

Theorem  1  Let  Qn  =  be  an  arbitrary  random  pro¬ 

cess  in  a  compact  subset  A  C  (X,  p)  with  {(A;,  Yi)}?-!  satis¬ 
fying  (AO).  Let{Wn}  be  probability  weights.  If 

(Cl)  b  EL  Er=t  Wni(Qn)I(p(Xn,  Xi)  >  e)  ->  0  Ve  >  0, 

( CS)  Tf  E n=2  maX*  -►  °. 

hold  a. s.,  then  for  F(y\x)  that  satisfies  (Al)  and  (A 2)  we  have 
N 

L  Y  E[\mn(Xn)  -  m(X„)|2|n„]  -»  0 

n=2 

Applying  this  theorem  to  the  fc„-nearest  neighbor  and  kernel 
algorithms,  we  can  show  that  (Cl)-(C2)  are  satisfied  by  the 
standard  conditions  kn  — »  oo,  kn/n  — >  0  and  en  — ►  0,  7ie£  — >  0 
(in  Rr)  resp.  This  means  that  these  universally  consistent  es¬ 
timates  are,  in  fact,  also  universally  time- averaged  consistent 
for  every  random  process  {Xi}.  As  a  corollary,  we  have  the 
following  partial  extension  of  the  pointwise  consistency  result 
of  [6]  to  stationary  processes  in  general  metric  spaces  for  con¬ 
tinuous  regression  functions. 

Theorem  2  Let  Qn  —  {A;}^  be  a  stationary  process  in  a 
compact  subset  A  C  (X,  p)  with  {(At,  Yi)}?=i  satisfying  (AO). 
Let  {Wn}  be  a  sequence  of  weights.  If 

(51)  Wni{Un)l{p{Xn,Xi )  >  e)  -  0,  for  all  e  >  0, 

(52)  E  max;  Wni[yin)  — ►  0, 

hold ,  then  for  F(y\x)  that  satisfies  (Al)  and  (A2)  we  have 
E[\mn(Xn)  —  m(X„)|2]  — >  0 

We  have  also  shown  Theorem  2  to  hold  when  (A2)  is  omitted 
and  (Al)  is  relaxed  to  EY2  <  oo. 
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Abstract  —  Closed-form  solutions  are  presented 
for  function  estimation  using  the  orthogonal  series 
method  and  various  model  selection  criteria.  While 
Akaike’s  AIC  criterion  does  not  lead  to  consistent  es¬ 
timates,  a  family  of  criteria  that  includes  Minimum 
Description  Length  operates  within  a  logarithmic  fac¬ 
tor  of  the  minimax  rate  in  a  range  of  Sobolev  smooth¬ 
ness  classes. 

I.  Model  for  Function  Estimation 
Consider  the  following  model:  yi  =  f(i/N)-\-Ci,  0  <  i  <  N, 
where  Ct  are  iid  N( 0, 1)  and  the  sampled,  unknown  function  / 
admits  the  representation  /  =  Yn=o  0n(pn  in  the  orthonormal 
basis  {4>n}  of  7 ZN .  We  investigate  estimators  of  the  form  /  = 
0n4>n  where  0  has  k  <  N  nonzero  components  and  0 
and  k  are  chosen  so  as  to  maximize  the  criterion 

TV  —  1 

-f  $>i-/(i/A0|2-CWfc  (1) 

i=0 

where  the  constant  Cn  =  1  (AIC  [1]),  or  Cn  —  §  log2  N 
(MDL*  [2]1),  or  Cn  =  In  N  (DJ,  see  below).  Other  model 
selection  criteria  (choices  of  Cn)  may  be  considered.  Related 
work  may  be  found  in  [3],  where  the  largest  model  order  is 
restricted  to  be  o(N),  and  [4]. 

II.  Basic  Results 

Proposition  1.  The  maximizer  of  (1)  is  0n  =  T\(rn), 
where  rn  =  and 


of  {( pn }  and  Cn .  For  instance,  when  {<j)n}  is  a  wavelet  basis 
and  Cn  ~  In  A,  the  estimator  is  almost  minimax  over  a  wide 
class  of  functions  [5]. 

A  basic  problem  is  to  evaluate  the  performance  of  the  AIC 
and  MDL*  criteria  in  (1).  We  use  the  squared  l2  risk 


TV-1 

=  at1£p(aa) 

n=0 

(3) 

where  p(X,9)  =  E\Tx(t )  -  9\2  =  (92  -  1)[$(A  -9)-  $(-A  - 
0)]  +  1  +  (A  -  A  -  9)  +  (A  +  9)<f>( A  +  9)  >  A2$(-3A),  and 
<p(-)  and  $(•)  are  respectively  the  normal  pdf  and  cdf. 


RnV)  =  E 


N 


TV-1 

"£ 

n— 0 


I f(n/N)  -  f(n/N) |2 


Proposition  2.  For  every  choice  of  basis  {0n}, 

(i)  a  necessary  condition  for  consistency  of  /  in  the  Rn  sense 

is  Cn  — >  oo  as  N  — >  oo. 

(ii)  the  AIC  estimator  is  not  consistent. 

Proof.  Using  (3)  and  the  lower  bound  on  p( A,  6)  we  obtain 
a  necessary  condition  for  Rtv(/)  — >  0:  A2<f>(— 3A)  — >  0,  hence 
(i).  (ii)  follows  immediately.  □ 


III.  Estimation  Using  Fourier  Series 
Convergence  rates  are  computed  for  Fourier  series  and  func¬ 
tions  in  the  L2  Sobolev  ball  W£(R)  = 

{/  :  Jo1  \fW{t)\2dt  <R2<m,  /<«>( 0)  =  /(i)(l),  0  <  i  <  s}. 
The  estimator  does  not  know  s  or  R. 

Proposition  3.  For  the  choice 


™  =  { I ;  A  <2> 

is  the  “hard  threshold”  function,  with  threshold  A  =  %/2CAv. 
Proof.  By  orthonormality  of  {cf)n}  and  Parseval’s  theorem, 
the  criterion  (1)  takes  the  form  —  ~  Yn=o  lTn  ~@n\2  ~~  UvA:  = 
J2n=o  L^n)  where 

-\\Tn-9n\2-CN  :  if  0n  ^  0 

-5|r„|2  :  else. 

The  criterion  may  thus  be  maximized  over  each  coordinate 
independently.  The  maximizer  of  Ln(0n)  is  0n  =  rn  if 
-Ctv  >  — ||rn| 2  and  6n  =  0  otherwise.  The  statement  of 
the  proposition  follows  directly.  □ 

Prop.  1  establishes  the  fundamental  role  of  thresholding 
in  solving  (1)  and  admits  a  hypothesis-testing  interpretation. 
It  also  provides  a  closed-form  expression  for  6  which  may  be 
used  to  evaluate  various  properties  of  /  for  different  choices 

^his  particular  formulation  of  the  MDL  criterion  accounts  for 
the  familiar  log2  iV  bits  for  encoding  the  k  real  parameters  9n 
and  k  log2  N  bits  for  encoding  their  index  (assuming  a  uniform  prior 
on  indices). 


CN  =  13  In  N,  2s/ (2s  +  1)  <  0  <  oo,  (4) 

the  risk  is  Rn( f)  <  Cn^iPN"1  In N)2s^2s+l\  where  Cr,s 

does  not  depend  on  N  or  (3.  The  rate  above  is  attained  by  the 

MDL*  (p  =  2^2  ^  2.16)  and  DJ  (j3  =  1)  estimators  and  is 

within  a  logarithmic  factor  of  the  minimax  rate  [6]. 
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Abstract  —  Modified  kernel  estimators  calculated  also  to  /(£).  Hence,  /(£)  can  be  treated  as  a  new  estimate 
from  compressed  data  for  density  estimation  and  sig-  of  /(£)  or  we  could  think  of  /(£)  as  being  a  compressed  ap- 
nal  recovering  problems  are  proposed.  An  asymp-  proximation  to  the  estimator  /(£).  The  general  form  of  such 
totically  optimal  compression  technique  utilizing  the  estimators  for  the  density  estimation  problem  is  the  following 
quantile  process  and  data  binning  is  employed.  The  /(£)  =  Y^j=inoKh(t  ~  aj)/  ^2j=inji  where  ai,...,ojv  are 
statistical  accuracy  of  the  introduced  kernel  estima-  center  points  representing  the  partition  of  the  data  set  into 
tors  is  studied,  i.e.,  we  derive  mean  squared  error  re-  N  clusters,  rij  =  Wj(Xi),  where  Wj(x)  is  the  weight  of 

suits  for  the  closeness  of  the  these  estimators  to  both  assigning  x  to  the  jth  cluster. 

the  true  functions  and  the  kernel  estimators  deter-  For  given  center  points  {ai,...,ajv}  one  would  like  to  se- 
mined  from  non  compressed  data.  lect  N  yielding  the  largest  possible  compression  ratio.  On 

the  other  hand,  we  wish  to  preserve  the  rate  of  convergence 
I.  INTRODUCTION  attained  by  the  classical  kernel  estimator.  These  two  con- 

The  problem  of  estimating  a  nonparametric  function  /(£)  from  flicting  factors  allow  us  to  select  N  as  a  function  of  n  yielding 
a  finite  data  record  has  received  a  great  deal  of  attention  in  the  desire  convergence  rate.  Recommendations  concerning  the 
recent  years  both  in  Information  Theory  as  well  as  St  at  is-  choice  of  h  and  N  are  presented.  In  particular,  it  is  shown  that 
tics.  Two  important  models  for  /(£)  include  density  esti-  N  can  be  selected  as  N  =  cns^2^2s+1\  f  €  CS(R)  for  a  certain 
mation  and  function  recovering.  In  the  former  case  f(t)  is  rule  of  generation  of  the  center  points  {ai, . . .  ,ajv}  • 
a  density  function  of  a  random  variable  X,  whereas  in  the 

latter  situation  f(t)  is  a  signal  observed  in  the  presence  of  REFERENCES 

noise.  The  estimation  problem  in  both  settings  can  be  for-  jjj  w.Greblicki  and  M.Pawlak,  “Dynamic  system  identification 
mulated  as  follows.  Given  a  sequence  of  independent  ran-  with  order  statistics”,  IEEE  Trans.  Inform.  Theory,  40,  1474- 

dom  variables  {Xi, . . .  ,Xn}  distributed  as  X  we  wish  to  es-  1489,  1994. 

timate  the  density  /(£).  In  the  second  case  one  observes  [2]  S.Cambanis  and  N.L.Gerr,  “A  simple  class  of  asymptotically 
yj  =  f(tj)  4-  Sj ,  at  points  {tj}  and  wishes  to  recover  /(£).  A  optimal  quantizers”,  IEEE  Trans.  Inform.  Theory,  29,  664-676, 
popular  nonparametric  technique  for  recovering  /(£)  is  the  ker-  1983. 

nel  estimator,  defined  as  /(£)  =  n  YLi=i  ~  Xi)i  where,  [3]  A.Gersho  and  R.M.Gray,  Vector  Quantization  and  Signal  Com- 
Kh(t)  —  K(t/h)/h,  K  is  a  kernel  function  and  h  is  a  smooth-  pression,  Kluwer,  Boston,  1992. 

ing  parameter.^  As  for  the  curve  recovering  problem  we  can  ^4]  M.Pawlak  and  U.Stadtmiiller,  “Kernel  estimators  with  com¬ 
use  f{t)  —  Y*_x  yjAjKhjt  —  £i),  where  A,  =  ti  —  U~  1  and  pressed  data”,  Technical  Report,  1995. 

£0  <  £1  <  *  •  •  <  tn  is  an  ordered  sequence. 

Under  suitable  conditions  for  the  kernel  function  and  the 
sequence  {tj}  it  is  known  that  if  /  G  CS(R)  then  E(f(t)  — 
f(t))2  =  0(n~2s^2s+1^)  for  h  being  selected  optimally  as 
cn~1/(2s+1\  Nevertheless,  the  aforementioned  estimators 
need  0(n)  evaluations  at  each  point  £  and  this  is  often  a  pro¬ 
hibitive  complexity.  It  is  our  aim  in  this  paper  to  propose 
modified  versions  of  the  kernel  estimators  with  a  substantially 
reduced  computational  complexity  and  yet  with  the  asymp¬ 
totically  optimal  rate  0(n~2s^2s+1^). 

II.  KERNEL  ESTIMATORS  FROM 
COMPRESSED  DATA 

The  reduced  complexity  kernel  estimators  can  be  designed  first 
by  utilizing  some  compression  techniques  to  the  original  data 
set  followed  by  a  binning  process  applied  to  the  classicial  es¬ 
timators.  We  utilize  a  compression  technique  employing  the 
quantile  process  generated  by  a  certain  density  function.  A 
question  of  considerable  practical  importance  concerns  the  ac¬ 
curacy  of  our  new  kernel  estimators  based  on  such  compressed 
data.  Hence,  if  /(£)  denotes  a  kernel  estimate  from  the  com¬ 
pressed  data  then  we  examine  how  close  /(£)  is  to  /(£)  and 

1  Research  supported  by  NSERC  Grant  A8131  and  Humboldt 
Foundation  _  __ 
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Abstract  —  We  define  a  regression  function  estimate 
based  on  complexity  regularization,  where  the  list  of 
candidate  functions  and  the  corresponding  penalties 
are  determined  from  the  training  data,  leading  to  im¬ 
proved  performance. 

Let  ( X ,  Y)  be  an  TV3,  x  H- valued  pair  of  random  variables 
with  regression  function  m(x)  =  E{y|X  =  a:}.  We  assume 
that  Y  (and  therefore  also  m(X))  is  bounded  with  probability 
one,  i.e.,  P{|y |  <  B)  =  1  for  some  B  <  oo. 

The  regression  function  m(x)  is  to  be  estimated  based  upon 
the  data  Tn  =  ((Xi,  Y\ (X„,  Yn)),  where  the  (Xi,  Y) 
pairs  are  independent  copies  of  (X,  Y).  A  regression  func¬ 
tion  estimate  is  thus  a  function  mn  :  Hd  X  ( Hd  X  7 1)  ->  7£, 

whose  performance  is  measured  by  the  squared  error 

J(mn)  =  E{(m„(X,Tn)-y)2|T„}-E{(m(X)-y)2} 

(m„(x,T„)  -  m(x))2fj(dx), 

where  p  denotes  the  distribution  of  X.  We  will  use  the  short¬ 
hand  notation  mn(x)  instead  of  mn(x,T„). 

Complexity  regularization  (see  Barron  and  Cover  [2],  Bar¬ 
ron  [1])  selects  an  estimate  mn  from  a  countable  list  of  candi¬ 
dates  Tn  by  minimizing  the  sum  of  the  empirical  error 

n 

•  as  1 


is  used  to  carry  out  minimization  of  the  empirical  error.  For 
convenience,  we  take  m  —  [n/2j ,  but  other  choices  asm«  y/n 
may  be  better  in  certain  cases. 

As  the  first  step  towards  defining  our  estimate,  consider 
a  sequence  Ai,  A2, . . .  of  classes  of  bounded  functions  Tid  — ¥ 
[—  B}  B],  These  classes  may  be  uncountable,  but  to  avoid  cer¬ 
tain  problems  of  measurability,  we  assume  that  every  model 
class  A,  contains  a  countable  subclass  A*  with  the  property 
that  every  /  €  A,  is  a  pointwise  limit  of  a  sequence  of  functions 
from  A*. 

FoDowing  ideas  of  Buescher  and  Kumar  [3],  we  construct 
a  proper  minimal  empirical  cover  of  each  class  A i  based  upon 
the  data  X™  =  Xi,...,Xm,  i.e.,  for  each  t,  we  take  a  set 
Qi  6  A i  with  the  following  properties.  For  every  /  6  A,-  there 
exists  g  €  Qi  such  that 

J=1 

and  Qi  has  minimal  cardinality.  Denote  it  by  | Qi  \  — 

JV(Xr,  Ai). 

To  each  /  £  G\,  we  assign  the  complexity  penalty 

Ai(/,xn  =  b2  log  AiLt_£i, 

where  the  a ’s  are  required  to  satisfy  the  Kraft- type  inequality 
e“c*  <  1.  Define  the  estimate  m„  as  a  function  that 
minimizes  the  penalized  empirical  error 


and  an  appropriately  defined  complexity  penalty  A„(/)  over 
/  €  Fn.  Intuitively,  more  “complex”  candidates  are  penalized 
by  more  in  order  to  avoid  overfitting.  Barron  [1]  proves  that 
if  the  best  candidate  in  Tn  is  close  to  m,  then  the  method 
indeed  performs  extremely  well.  In  particular, 

where  the  penalties  An(/)  are  required  to  satisfy  the  summa- 
bility  constraint  JZ/ern  2~An^  <  00  for  each  n  >  1. 

Our  goal  is  to  assure  that  the  list  of  candidates  con¬ 
tains  elements — with  not  too  large  complexity  penalties — 
that  closely  approximate  the  regression  function.  We  let  the 
data  determine  the  list  of  functions  which  adds  a  tremendous 
amount  of  flexibility,  leading  to  an  improved  performance. 

The  basic  idea  is  splitting  the  data  in  two  such  that  the 
first  half 

Tln  =((Xi,Yi),...,(Xm,Ym)) 

is  used  to  determine  the  (random)  list  of  functions  T n  and  the 
corresponding  penalties,  and  the  second  half 

rt  =  ((Xm+l,Ym+i),...,(Xn,Yn)) 

JThe  research  was  supported  in  part  by  the  National  Science 
Foundation  under  Grants  No.  NCR-92-96231  and  INT-93-15271. 


n 

—  52  (f(Xj)-Yi)2+Mf,xn 

n  —  m 

;=m+l 

over  all  /  €  Q  =  U2i&- 

For  the  performance  of  the  estimate  we  have  the  following 
result. 


Theorem  1  For  a  universal  constant  C, 


E  <  Cinf 

1  v  “  »>i  \ 


AogEiVCXr.A.O  +  Cj 


+  inf  J(f) 

feAi 


)■ 


The  main  improvement  in  our  result  is  that  in  the  second 
term,  the  infimum  is  taken  over  an  uncountable  collection. 
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Abstract  —  Suppose  we  observe  «i,  z2|  • . .  G  X  drawn 
i.i.d.  according  to  some  unknown  distribution  P  se¬ 
lected  from  a  family  of  distributions  V.  Let  /  :  V  — ►  A 
be  a  parameterization  of  P  E  V.  In  this  paper,  we 
study  necessary  and  sufficient  conditions  for  the  ex¬ 
istence  of  strongly  consistent  estimators  of  /(P).  A 
number  of  previous  results  along  these  lines  are  spe¬ 
cial  cases  of  our  main  result. 

I.  Introduction  and  Formulation 

This  paper  is  concerned  with  characterizing  when  strongly 
consistent  estimators  exist  for  hypothesis  testing  and  estima¬ 
tion  problems  such  as  those  posed  in  the  abstract.  Our  in¬ 
terest  in  this  problem  stems  from  the  work  of  Cover  [1]  and 
subsequent  work  [5,  8,  6,  2,  4]  that  significantly  generalized 
the  set-up  in  [1].  This  paper  presents  an  extension  and  alter¬ 
native  approach  to  recent  results  of  Dembo  and  Peres  [2].  The 
flavor  of  the  results  is  also  along  the  lines  of  the  classical  work 
of  Hoeffding  and  Wolfowitz  [3]  and  LeCam  and  Schwartz  [7]. 

Let  X  (the  sample  space)  be  a  complete,  separable  metric 
space  (e.g.,  think  of  lRm),  and  let  A4 i(X)  denote  the  space 
of  all  Borel  probability  measures  on  X.  Let  (A,  d)  denote  an¬ 
other  metric  space  (the  parameter  space ,  with  d  denoting  the 
metric),  which  is  cr-compact.  Again,  one  can  think  of  A  =  IR™ 
as  a  characteristic  example.  Let  f  :  V  C  A4i(X)  — >  A  denote 
a  Borel  measurable  map.  Thus,  f  denotes  a  parameterization 
of  the  probability  measures  in  V,  with  f(P)  being  the  param¬ 
eter  associated  with  P.  We  are  interested  in  the  estimation  of 
f(P)i  based  on  a  sequence  of  i.i.d.  observations  with 

marginal  distribution 

Definition  1  (Discernibility  and  Estimation) 
a)  Ao,  Ai, . . .  ,  Ah  C  V  are  discernible  if  there  exists  a  strongly 
consistent  decision  rule  for  deciding  to  which  Ai  an  unknown 
P  (E  UJL0A;  belongs  —  that  is,  if  there  exists  a  sequence  of 
Borel  functions  gn  :  Xn  — >  {0,1,.../:}  such  that,  for  any 
P  £  A.% ,  almost  surely  g  («Ei,  . . . ,  x n)  y n — ►  qq  i  .  b)  A  is 

/-estimatable  if  there  exists  a  strongly  consistent  estimator 
for  f(P).  That  is,  if  there  exists  a  sequence  of  Borel  func¬ 
tions  gn  :  Xn  — ►  A  such  that,  for  all  P  £  V ,  almost  surely 
g  (zi ,  .  .  .  ,  Xn)  — ►n-i-oo  f(P). 

We  use  the  term  discernibility  following  [2].  We  will  also  use 
notions  of  uniform  discernibility  and  estimation  in  which  there 
is  a  uniformly  consistent  estimator  over  all  P  £  V. 

II.  Main  Results 

We  first  mention  a  result  showing  that  it  is  enough  to  char¬ 
acterize  discernibility  to  obtain  results  on  the  more  general 
estimation  problem.  Namely,  we  have  shown  that  A  is  /- 
estimatable  iff  for  all  Ai , . . . ,  A*  C  A  that  are  positively  sepa¬ 
rated,  the  sets  /~1(Ai), . . . ,  /-1(Afe)  are  discernible.  We  thus 
concentrate  below  only  on  the  notion  of  discernibility. 

We  have  also  shown  that  Ao,  Ai, . . . ,  Ah  C  V  are  discernible 
iff  there  exist  sequences  A™  /*  Ai  for  i  =  0,...,A:  such  that 

*This  work  was  supported  in  part  by  the  National  Science 
Foundation  under  grants  IRI-9209577  and  IRI-9457645  and  by  the 
U.S.  Army  Research  Office  under  grants  DAAL03-92-G-0320  and 
DAAL03-92-G-0115. 
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are  uniformly  discernible  for  each  n.  Using  this 
result,  our  approach  is  to  make  some  quite  general  assump¬ 
tions  regarding  separation  properties  of  the  Ai  and  uniform 
discernibility.  The  idea  is  that  conditions  regarding  uniform 
discernibility  are  easier  to  check,  but  still  lead  to  statements 
on  (non-uniform)  consistency. 

We  now  assume  that  Ad i\X)  is  endowed  with  some  metric 
p.  For  A,  B  C  Vi  define  the  separation  between  A  and  B  as 
p(A,  B)  =  infpie 4  p(Fi,  R2)  •  A,  B  are  called  positively 
separated  if  p(A,  B)  >  0.  Sets  Ao, . . . ,  A*  are  said  to  be  sep¬ 
arable  by  closed  sets  if  there  exist  closed  sets  Bo, . . . ,  Bk  with 
Ai  C  Bi  and  Bi  D  Bp  =  0  for  i  ^  i* .  We  say  that  Ao, . . . ,  Ah 
are  positively  separated  by  finite  covers  if  each  Ai  can  be  cov¬ 
ered  by  a  finite  number  of  closed  balls,  and  the  covers  are 
pairwise  positively  separated.  We  recall  that  in  a  topological 
space  a  set  A  is  said  to  be  Fa  if  it  is  a  countable  union  of 
closed  sets,  that  is,  if  A  =  where  the  Fi  are  closed. 

Ao,  Ai, . . . ,  Ah  are  called  Fa -separated  if  Ai  C  Bi,  where  the 
Bi  are  Fa  sets  with  Bi  D  Bp  =  0  for  all  i  ^  i* . 

Definition  2  (Properness  Conditions  on  p) 

(PI)  p  is  said  to  be  (Pl)-proper  w.r.t.  V  if  every  Borel  Ai  C  V 
which  are  uniformly  discernible  are  separable  by  closed  sets. 
(P2)  p  is  said  to  be  (P2)-proper  w.r.t.  V  if  p  makes  Mi(X)nV 
into  a  separable  space ,  and  any  Borel  sets  Ai  C  V  which  are 
positively  separated  by  finite  covers  are  uniformly  discernible. 

In  a  sense,  these  conditions  impose  “non-degeneracy”  require¬ 
ments  on  the  metric  p  restricted  to  V.  The  following  result 
extends  and  corrects  Theorem  3.3  of  [6].  Results  from  [1],  [5], 
and  Theorem  2  of  [2]  can  be  recovered  using  Theorem  1  below. 
We  denote  A  =  U*_0A;. 

Theorem  1 

a)  (sufficient  condition)  Assume  p  is  (P2)-proper  w.r.t.  A,  If 
Ao, . . . ,  Ah  are  Fa  separated  then  they  are  discernible. 

b)  (necessary  condition )  Assume  p  is  (Pl)-proper  w.r.t.  A.  If 
Ao, . .  . ,  Afc  are  discernible  then  they  are  Fa -separated. 
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Abstract  —  The  finite-sample  risk  of  the  k-nearest 
neighbor  classifier  that  uses  a  weighted  Lv  metric  as  a 
measure  of  class  similarity  is  examined.  For  a  family  of 
multiclass,  classification  problems  with  smooth  distribu¬ 
tions  in  Rn,  the  risk  is  represented  as  an  asymptotic  ex¬ 
pansion  in  decreasing  fractional  powers  of  the  reference- 
sample  size.  An  analysis  of  the  leading  coefficients  re¬ 
veals  that  the  optimal  metric  (i.e.,  the  metric  that  mini¬ 
mizes  the  risk)  tends  to  a  weighted  Euclidean  (i.e.,  I2)  met¬ 
ric  as  the  sample  size  is  increased.  Numerical  calculations 
corroborate  this  finding. 

I.  The  A:-Nearest-Neighbor  Classifier 
Let  the  elements  of  H  =  {1,...,C}  denote  C  states  of  na¬ 
ture,  or  pattern  classes,  and  let  Pi, ...  ,Pc  denote  their  cor¬ 
responding  stationary  prior  probabilities.  Each  pattern  is 
represented  by  a  feature  vector  x,  drawn  at  random  from 
W1.  Specifically,  patterns  originating  from  class  £  e  (L  are 
generated  by  the  stationary  conditional  distribution  F^. 

Labeled  feature  vectors  are  generated  by  a  two-step  pro¬ 
cess.  First,  a  class  £  e  L  is  chosen  at  random  so  that 
Pr[£  =  j]  =  Pj\  then  a  random  feature  vector  is  drawn 
according  to  F^.  After  m  independent  repetitions  of  this 
process,  we  obtain  the  labeled  reference  sample, 

Xm  =  {(x1,F1),...,(xw,  r1)}. 

Given  a  weighted  Lp  metric,  d(x,  y)  =  ||A(x-y)||p,  where 
A  denotes  an  n-by-n,  positive-definite,  symmetric  matrix 
with  det  A  =  1,  and  an  arbitrary  point  x  e  Rn,  the  indices  of 
the  labeled  feature  vectors  in  Xm  can  be  permuted  so  that 

d(x1x1)  <  d(x,x2)  <  <  d(x,xm).  (1) 

Here  ||x||p  =  (\x\]p  +  ■  ■  •  +  \xn\p)1/p  for  1  <  p  <  00,  and 
Hxlloo  =  maxi<i<n  \xi\,  denote  the  Lv  norm.  The  k  nearest 
neighbors  of  x  then  form  the  subset  {(x1,-#1), . . . ,  {xk,£k)}] 
and  the  k-nearest-neighbor  classifier  assigns  x  to  class 
L'(x)  =  majCF1,...  ,£k),  viz.,  the  most  frequently  appearing 
class  label  in  the  subset.  (Ties,  and  degeneracies  in  (1),  can 
be  resolved  by  an  arbitrary  procedure.)  Using  this  algorithm 
every  point  in  IRn  can  be  assigned  to  a  class  in  L. 

II.  The  Finite-Sample  Risk 

Given  a  positive  integer  k,  an  Lp  metric,  and  a  finite  ran¬ 
dom  reference  sample  Xmj  a  single  test  vector  (X,I),  drawn 
independently  by  the  same  random  process,  is  assigned  to 
class  L'  =  L'(X)  by  the  k-nearest-neighbor  classifier.  We 
now  consider  the  m-s ample  risk, 

c  c 

Rm  =  X  X  AU  Prtl'  =i,L=  j], 
t-U-l 
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with  the  zero-one  cost  matrix  AtJ  =  1  -  5ij. 

For  a  family  of  classification  problems,  described 
by  class-conditional  probability  densities  with  uniformly 
bounded  partial  derivative  up  through  order  N  +  1,  and  a 
mixture  density  /  =  X^=i  Pjfj  that  is  bounded  away  from 
zero  a.e.  on  its  probability-one  support  S  c  IRn,  we  obtain 
the  following: 

Theorem  1  There  exist  constants cj,  forj  =  2, 3, . . . ,  N,  such 
that 

N 

Rm  =  Ro 0  +  X  cJm  Jln  +  0(m~iN+l)ln) 

j=2 

where  R&  is  the  infinite-sample  risk  derived  by  Cover  and 
Hart  [11 


(Aversion  of  this  theorem,  restricted  to  the  case  k  =  1,  p  = 
2,  A  =  /,  and  C  =  2,  appears  in  a  recent  paper  [2].)  The 
coefficient  C2  evaluates  to 


c2  =  Dn(p) 


r(k  +  1  +  — ) 


^4tr{(A-1)r/fA-1l, 


where, 


l+(2/n) 


Dn(p ) 


24  [r(tfi)] 

r(l  +  1)r(f +  1) 

r(“f  +  1)r(?*i)S  ’ 


A-1  denotes  the  inverse  of  the  metric  weight  matrix  A,  and 
H  is  an  n-by-n  matrix,  independent  of  p .  For  the  two-class 
problem  (C  =  2), 


Hij 


s 


1  a2/i 

fi  dxidxj 


1  52/2  \ 
f2  dxidxj) 


Here,  Pj  =  F^(x)//(x)  denotes  the  posterior  probability 
that  a  feature  vector  with  value  x  originates  from  class  £. 


III.  A  Desirable  Metric 

Since  R*>  does  not  depend  upon  the  chosen  metric,  Theo¬ 
rem  1  suggests  that  the  finite-sample  risk  of  the  k-nearest- 
neighbor  may  be  reduced,  for  large  values  of  m,  by  selecting 
a  metric  that  minimizes  C2.  It  can  be  shown  that  Dn(p)  has 
a  global  minimum  at  p  =  2  for  fixed  n  >  1.  Using  the  Euler- 
Lagrange  multiplier  theorem,  the  trace  in  C2  is  minimized 
if  the  weight  matrix  A  satisfies  ArA  =  H / (det H)l/n.  Al¬ 
though  it  may  be  difficult  to  determine  H,  and  consequently 
the  optimal  matrix  A,  in  practice,  this  analysis  and  corrobo¬ 
rating  numerical  simulations  motivate  the  use  of  a  weighted 
Euclidean  metric  for  large  reference  samples. 
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This  work  develops  a  unified  approach  for  hard  optimiza¬ 
tion  problems  involving  data  association,  i.e.  the  assignment 
of  elements  viewed  as  “data”,  {zt},  to  one  of  a  set  of  classes, 
{Cj}}  so  as  to  minimize  the  resulting  cost.  The  diverse  prob¬ 
lems  which  fit  this  description  include  data  clustering,  statis¬ 
tical  classifier  design  to  minimize  probability  of  error,  piece- 
wise  regression,  structured  vector  quantization,  as  well  as  op¬ 
timization  problems  in  graph  theory,  e.g.  graph  partitioning. 
Whereas  standard  descent-based  methods  are  susceptible  to 
finding  poor  local  optima  of  the  cost,  the  suggested  approach 
provides  some  potential  for  avoiding  local  optima,  yet  without 
the  computational  complexity  of  stochastic  annealing. 

The  approach  we  develop  is  based  on  ideas  from  informa¬ 
tion  theory  and  statistical  physics,  and  builds  on  the  work  of 
Rose,  Gurewitz,  and  Fox  for  clustering  and  related  problems 
[1].  The  optimization  problem  is  embedded  within  a  frame¬ 
work  in  which  data  are  assigned  to  classes  in  probability ,  with 
Shannon’s  entropy  measure  used  to  control  the  level  of  un¬ 
certainty  or  randomness  in  the  assignments.  We  first  address 
“unconstrained”  assignment  problems  such  as  data  clustering 
and  graph  partitioning,  in  which  the  data  elements  are  freely 
assigned  to  any  class,  specifiable  by  binary  0-1  assignment 
variables.  We  consider  the  joint  distribution  over  all  possi¬ 
ble  assignments,  P[x  1  £  Cy(i), . .  . ,  xn  £  C>(W)]j  and  choose 
it  to  minimize  the  expected  assignment  cost  <  E  >,  given  a 
constraint  on  Shannon’s  entropy,  H .  Thus,  we  seek  the  best 
random  assignments  in  the  sense  of  <  E  >  for  a  given  H.  This 
formulation  is  equivalently  stated  by  invoking  the  maximum 
entropy  principle,  but  the  former  description  is  more  appeal¬ 
ing  for  optimization.  The  constrained  minimization  is  equiv¬ 
alent  to  the  unconstrained  minimization  of  the  Lagrangian: 
L  =  /?  <  E  >  —H ,  where  fi  is  the  Lagrange  multiplier  con¬ 
trolling  <  E  >  and  H.  Physical  inspiration  for  minimizing 
L  is  obtained  by  recognizing  that  it  is  the  Helmholtz  free  en¬ 
ergy  of  a  simulated  system,  with  <  E  >  the  “energy”  and  ~ 
the  “temperature”.  Thus,  a  deterministic  annealing  approach 
is  naturally  suggested,  wherein,  starting  from  high  tempera¬ 
ture  (/?  =  0),  the  cost  and  randomness  are  reduced  with  the 
temperature.  At  low  temperature  (/?  — ►  oo)  the  hard  cost  is 
minimized.  Our  formulation  unifies  the  deterministic  anneal¬ 
ing  method  for  clustering  with  mean-field  annealing  meth¬ 
ods  proposed  for  combinatorial  optimization  [2].  Moreover, 
the  derivation  provides  an  intuitive,  yet  precise  description  of 
what  constitutes  annealing  in  these  optimization  methods.  In 
particular,  the  annealing  process  is  characterized  as  a  reduc¬ 
tion  in  the  system’s  entropy  and  expected  cost  through  the 
increase  of  a  Lagrange  multiplier  interpreted  as  the  inverse 


temperature. 

While  this  description  may  provide  insights  into  existing 
methods,  a  more  significant  benefit  lies  in  its  generality,  and 
hence  its  potential  for  stimulating  development  of  novel  opti¬ 
mization  methods  tackling  heretofore  unaddressed  assignment 
problems.  Of  prime  interest  are  what  we  will  call  structurally- 
constrained  problems,  wherein  the  assignments  are  restricted 
to  be  consistent  with  a  (parametrized)  classification  rule. 
These  problems  abound  in  pattern  recognition  and  source  cod¬ 
ing,  and  include  statistical  classifier  design,  piecewise  regres¬ 
sion,  and  structured  vector  quantization.  The  restricted  as¬ 
signments  may  be  produced  by  a  nearest  prototype  rule,  a 
decision  tree,  or  neural  network  structures  such  as  radial  ba¬ 
sis  functions  or  multilayer  perceptrons.  Thus,  the  previous 
optimization  framework  requires  substantial  extension  in  or¬ 
der  to  enforce  the  structural  constraint  on  the  assignments. 
To  do  so,  we  introduce  an  additional  cost  Cs,  which  quantifies 
achievement  of  the  structural  constraint.  This  cost  is  incorpo¬ 
rated  within  a  generalization  of  the  basic  formulation  we  have 
described,  so  that  the  annealing  process  controls  <  Cs  >,  as 
well  as  <  E  >  and  H .  A  second  Lagrange  multiplier  is  iden¬ 
tified  which  controls  <  Cs  >.  This  parameter  is  chosen  to 
provide  the  optimal  “level”  of  structural  constraint  consistent 
with  <  E  >  and  H  at  each  temperature  in  the  annealing 
process.  At  the  limit  fi  — ►  oo,  a  “hard”  classifier  with  the  req¬ 
uisite  structure  is  achieved,  and  the  assignment  cost  is  min¬ 
imized  directly.  This  general  optimization  paradigm  has  sig¬ 
nificant  potential  for  outperforming  descent-based  approaches 
for  structurally-constrained  assignment  problems.  In  several 
coming  papers,  these  ideas  are  applied  to  the  two  fundamen¬ 
tal  problems  of  supervised  learning  -  statistical  classification 
and  regression  -  as  well  as  to  the  design  of  novel  source  cod¬ 
ing  structures  (generalized  vector  quantizers),  with  promising 
results  achieved  in  all  of  these  domains  [3],  [4]. 
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Abstract  —  In  the  paper  we  study  convergence  prop¬ 
erties  of  radial  basis  function  (RBF)  networks  in  non¬ 
parametric  classification  for  a  large  class  of  basis  func¬ 
tions  with  parameters  of  RBF  nets  learned  through 
empirical  risk  minimization. 

I.  Introduction 

In  the  classification  (pattern  recognition)  problem,  based  upon 
the  observation  of  a  random  vector  X  £  IRd,  one  has  to  guess 
the  value  of  a  corresponding  label  F ,  where  Y  is  a  random  vari¬ 
able  taking  its  values  from  {  —  1,1}.  The  decision  is  a  function 
g  :  IRd  — ►  {  —  1, 1},  whose  goodness  is  measured  by  the  error 
probability  L{g)  =  P{y(A)  ^  F}.  Let  g*  be  a  Bayes  decision 
and  L*  be  the  Bayes  risk.  An  empirical  decision  rule  gn  is  a 
function  gn  :  IRd  x  (IRd  x  {— 1,  l})n  — ►  {  —  1,1},  whose  error 
probability  is  given  by 

L(gn)  =  P{<7n(X,  Dn)*Y\Dn] 

where  Dn  —  ((Xi,  Y\ ), . . . ,  (Xn,  Fn))  is  a  training  sequence  of 
i.i.d.  random  variables  independent  of  (A,  F).  A  sequence  of 
classifiers  { gn }  is  called  strongly  consistentii limn^oo  {L(Gn)  — 
L*)  —  0  almost  surely,  and  {gn}  is  strongly  universally  con¬ 
sistent  if  it  is  consistent  for  any  distribution  of  (X,  F). 

Let  I\  :  JRd  — *  IR  be  a  kernel  function.  Consider  RBF 
networks  given  by 

k 

fe(x)  =  y ^  WjK(Aj[x  -  a])  +  wo  (1) 

i=i 

where  6  —  (iy0, . . . ,  Wk,  ci , . . . ,  c*,  A\, . . . ,  Ah)  is  the  vector  of 
parameters,  w0, . . . ,  Wk  €  IR,  ci, . . . ,  c*  €  IRd,  and  A\, . . . ,  Ak 
are  nonsingular  d  x  d  matrices.  Let  {fcn}  be  a  sequence  of 
positive  integers.  Define  An  as  the  set  of  RBF  networks  in  the 
form  of  (1)  with  k  —  kn.  Given  an  fe  as  above,  we  define  the 
classifier  ge  to  be  1  if  fe(x)  >  0,  and  0  otherwise. 

Let  Qn  be  the  class  of  classifiers  based  on  the  class  of  func¬ 
tions  An.  To  every  classifier  g  £  Qn,  assign  the  empirical  error 
probability 

1  n 

L » (S)  = 

t  =  l 

We  pick  a  classifier  gn  from  Qn  by  minimizing  this  empirical 
error  probability.  The  distance  L(gn)  —  L*  between  the  error 
probability  of  the  selected  rule  and  the  Bayes  risk  is  decom¬ 
posed  into  the  estimation  error  and  the  approximation  error : 

L(gn)  -  L*  —  (  L(gn)  -  inf  L(g)  )  +  (  inf  L{g)  -  L*  J  . 

\  gE&n  J  \9EQn  J 
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II.  Approximation 

We  consider  the  approximation  error  when  A*  is  the  family  of 
RBF  networks  of  the  form  of  (1). 

Theorem  1  .  Suppose  K  :  IRd  -+  IR  is  bounded  and 
K  £  Li(A)  fl  Xp(A) 

for  some  p  £  [l,oo),  and  assume  that  J  K(x)dx  ^  0.  Let  p 
be  an  arbitrary  probability  measure  on  IRd  and  let  q  £  (0,  oo). 
Then  the  RBF  nets  in  the  form  (1)  are  dense  in  both  Lq(p) 
and  Lp( A).  In  particular,  ifmE  Lq(p)C\Lp( A),  then  for  any  e 
there  exists  a  8  =  ( wo , . . . ,  Wk,  bi , . . . ,  bk}  Ci, . . . ,  Ck)  such  that 

/  \fe(x)-m(x)\qp(dx)  <  e  and  /  \fe(x)-m(x)\pdx  <  e. 
J  IRd  J  IRd 

III.  CLASSIFICATION 

Define  the  class  of  sets 

Ci  =  {{x  £  IRd  :  K (A[x  —  c])  >  0}  :  c  £  IRd,  A  invertible}. 

Theorem  2  Let  K  be  an  indicator  such  thatVc1  <  oo.  Then 
for  every  n,  kn  and  e  >  0, 

rtt  r  /  \  •  t  rt  \  ^  l  /  .  (  Tf2  C2fcnlog(Cm)P 

P{L(9n)~gfL  L{9)  >  C}  -  4eXP  V  "  [32 - n - J  J 

for  some  constants  C\  and  C2  depending  only  on  Vc1 . 

Suppose  that  the  set  of  RBF  networks  given  by  (1),  k  being 
arbitrary ,  is  dense  in  L\(p)  on  balls  {x  £  IRd  :  ||x||  <  B} 
for  any  probability  measure  p  on  IRd.  If  kn  00  and 
n”1  (A:nlog  n)  — ►  0  as  n  —*■  00,  then  the  sequence  of  classi¬ 
fiers  gn  minimizing  the  empirical  error  probability  is  strongly 
universally  consistent. 

Using  a  result  by  Macintyre  and  Sontag  we  can  also  prove  that 
empirical  error  probability  minimization  can  fail  to  provide 
a  distribution  free  upper  bound  on  L(gn)  —  infffGGn  L(g)  for 
other  kernels  however  nice  these  kernels  may  seem.  We  have 
the  following  counter  example: 

Theorem  3  There  exists  a  K  :  IR  — >  IR  which  is  symmetric 
around  0,  monotone  decreasing  on  IR+  and  infinitely  differ¬ 
entiable,  such  that  for  k  >  2  there  is  no  distribution  free  up¬ 
per  bound  on  the  estimation  error  which  converges  to  zero  as 
n  — ►  00. 
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Abstract  —  We  estimate  the  best,  nonlinear,  mean- 
square  predictor  for  a  Markov  process  from  an  ob¬ 
served,  finite  realization  of  the  process-when  the  true 
Markov  order  is  unknown.  In  particular,  we  propose 
an  universal  minimum  complexity  estimator,  which 
does  not  know  the  true  Markov  order,  and  yet  deliv¬ 
ers  the  same  statistical  performance  as  that  delivered 
by  a  minimum  complexity  estimator,  which  knows  the 
true  Markov  order. 

I.  Introduction 

Given  a  sequence  of  observations  drawn  from  a 

stationary  Markov  process  {Xt}X_oo  of  order  g,  we  are  inter¬ 
ested  in  estimating  the  conditional  mean  of  Xo  given  the  past 
X-uX-2 namely 

mq(X- 1,  A-2,  •  •  •  ,  X-q)  =  E[Xo\X-l,  X-2,  •  *  *  5  X-q]. 

The  conditional  mean  mq  is  the  best,  nonlinear,  mean-square 
predictor  for  the  Markov  process  {Xl}^_00,  and  is  thus  an 
important  object  of  knowledge. 

In  addition  to  the  Markov  assumption,  we  assume  that 
{*<}£-  m00  is  strongly  mixing  [5]  with  exponential  decay  [2]. 
This  additional  assumption  is  required,  technically,  to  con¬ 
struct  consistent  minimum  complexity  estimators  for  mq. 

If  the  true  Markov  order  q  is  known,  then  we  can  estimate 
the  predictor  mq  by  proceeding  essentially  as  in  Modha  and 
Masry  [2].  In  particular,  if  we  assume  that  the  predictor  mq 
possesses  a  certain  bounded  spectral  norm  [1],  then  it  is  possi- 
ble  to  construct  a  minimum  complexity  estimator,  say  rhqiN , 
based  on  neural  networks  such  that  [2] 

MISE(mg>,v,  rnq)  —  O  ^(log  N)*  N~ 4  j  ,  (1) 

where  MISE  denotes  a  certain  mean  integrated  squared  error. 

In  this  paper,  we  consider  the  practically  important  case 
when  the  true  Markov  order  q  is  unknown.  In  particular,  as¬ 
suming,  as  before,  that  the  predictor  mq  possesses  a  bounded 
spectral  norm,  we  propose  (see  Section  II  below)  a  univer¬ 
sal  minimum  complexity  estimator,  say  raw,  based  on  neural 
networks  such  that  [3] 

MISE(mjv,  mq)  =  O  ^(log  N)^N~4  ^  .  (2) 

Precise  results  and  proofs  can  be  found  in  the  full  paper  [3]. 

Comparing  (1)  and  (2),  we  find  that  our  estimator  ttcn, 
which  does  not  know  g,  achieves  the  same  rate  of  conver¬ 
gence  as  that  of  rhq>N,  which  knows  q.  Asymptotically,  the 
estimator  mjv  not  only  learns  the  true  predictor  mq ,  but  also 
(implicitly)  discovers  the  true  Markov  order  g.  In  other  words, 
the  estimator  rrtN  is  universal.  This  notion  of  universality 
parallels  the  notions  of  universality  arising  in  the  context  of 
coding  of  finite  alphabet  processes  and  in  the  context  of  mean- 
square  prediction  of  Gaussian  ARM  A  processes  [4]. 

1This  work  was  supported  by  the  Office  of  Naval  Research  under 
Grant  N00014-90-  J-1175 


II.  Universal  Estimation  Scheme 

We  now  outline  a  two-stage  estimation  scheme  to  construct 
the  universal  minimum  complexity  estimator  mjv,  which  was 
advertized  and  discussed  in  the  previous  section.  Our  estima¬ 
tion  scheme,  which  can  be  found  in  the  full  paper  [3],  builds 
on  the  results  in  Modha  and  Masry  [2],  and  is  inspired  by  the 
results  in  Barron  [1]  and  in  Rissanen  [4]. 

Stage  1:  For  each  fixed  memory  1  <  p  <  log  A,  let  mp  denote 
the  conditional  mean  of  Xo  given  the  past  X— i ,  X-2,  •  -  ■ » X-p. 
Given  N  observations  {A;}-!*,  we  first  estimate  mp  (for  each 
1  <  p  <  log  N)  using  neural  networks  as  follows. 

A  neural  network  parametrized  by  a  very  small  number  of 
parameters  has  a  small  variance  (estimation  error),  but  also 
has  a  large  bias  (approximation  error)  in  estimating  mp\  on 
the  other  hand,  a  neural  network  parametrized  by  a  very  large 
number  of  parameters  has  a  small  bias,  but  also  has  a  large 
variance  in  estimating  mp.  The  minimum  complexity  esti¬ 
mator  mp>N ,  which  minimizes  a  certain  penalized  empirical 
loss,  selects  the  neural  network  (parametrized  by  an  appro¬ 
priate  number  of  parameters)  that  achieves  the  best  trade-off 
between  the  bias  and  variance.  Thus,  rhP)N  achieves  the  small¬ 
est  statistical  risk  (bias  +  variance)  in  estimating  mp. 

Stage  2:  Having  constructed  the  sequence  of  minimum  com¬ 
plexity  estimators  we  now  select  the  estimator 

rriN  as  the  element  of  the  sequence  that  achieves  the  smallest 
statistical  risk  in  estimating  mq. 

In  particular,  for  a  very  small  p,  mPiN  may  be  close  to  mPi 
but  mp  may  be  far  from  mq ;  on  the  other  hand,  for  a  very  large 
p,  mp  may  be  close  to  mq ,  but  mPtN  may  be  far  from  mp.  The 
universal  minimum  complexity  scheme  selects  a  data-driven 
memory  p,  which  minimizes  a  certain  penalized  empirical  loss 
and  hence  achieves  the  best  trade-off  between  the  competing 
terms.  Finally,  we  use  the  element  corresponding  to  p,  namely 
rriP'N,  as  our  universal  estimator  tun- 
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I.  Introduction 

The  following  problem  in  multiterminal  source  coding  was  in¬ 
troduced  in  [1].  A  firm’s  CEO  is  interested  in  a  data  sequence 
which  cannot  be  observed  directly.  The  CEO  em¬ 
ploys  a  team  of  L  agents  who  observe  independently  corrupted 
versions  of  {A(i)}^:1.  Let  R  be  the  total  data  rate  at  which 
the  agents  may  communicate  information  about  their  obser¬ 
vations  to  the  CEO.  The  agents  are  not  allowed  to  convene. 
In  [1]  Berger  and  Zhang  determine  the  asymptotic  behavior 
of  the  minimal  error  frequency  in  the  limit  as  L  and  R  tend 
to  infinity.  Their  result  is  for  discrete  memoryless  source  and 
observations.  In  this  paper  we  consider  a  special  case  of  the 
continuous  source  and  observations  problem.  We  assume  that 
the  source  is  an  i.i.d  sequence  of  zero  mean  Gaussian  random 
variables  (A/^(0,<r^))  and  the  observations  are  corrupted  by 
identical  independent  memoryless  Gaussian  noise  (A/*(0,  &%)). 
The  CEO  is  interested  in  reconstructing  the  source  with  min¬ 
imum  mean  squared  error.  We  study  the  asymptotic  behavior 
of  the  minimum  achievable  distortion  in  the  limit  as  first  L 
and  then  R  tends  to  infinity.  That  is  we  study  the  behavior 
of 

=  lim  lim  R  ^ . 

R—f  oo  L—+  oo  &x 

The  solution  to  this  problem  differs  sufficiently  from  that  for 
the  discrete  source  problem  studied  in  [1].  Our  main  result  is 

that  asymptotically  the  distortion  decays  at  best  as  1/R.  We 

2 

also  derive  the  upper  bound  (t2n)  <  These  results 

should  be  contrasted  with  the  fact  that,  if  the  agents  were 
allowed  to  convene  before  communicating  to  the  CEO,  they 
could  smooth  out  their  noisy  observations  and  achieve  a  rate 
distortion  performance  corresponding  to  that  of  the  source 
X ,  i.e.,  the  distortion  would  decay  as  2~2R.  Thus,  there  is 
a  significant  performance  degradation  in  the  isolated  agents 
case.  This  problem  also  serves  as  an  interesting  example  for 
connections  between  information  theory  and  statistics.  We 
use  the  Cramer- Rao  bound  for  random  parameter  estimation 
for  lower  bounding  the  achievable  distortion.  The  problem  is 
described  in  detail  in  Section  2  and  the  main  result  is  presented 
in  Section  3. 


where  X(t)  is  A/^O,^)  distributed  and  Ni(t)  is  indepen¬ 
dent  and  identical  over  i,t  and  is  distributed  Af(0,  <t2n)  for 

*  =  1, . .  • ,  L  and  t  =  1, . . .  n, _ 

For  i  =  1 , L,  Agent  i  encodes  a  block  of  length  n 
from  his  observed  data  {yi(t)}Zi  using  a  source  code  C”  of 
rate  R™  =  ^logl^fl.  The  code  words  from  the  L  agents, 
Cf,...C£,  are  sent  to  a  central  estimator  whose  task  is  to 
recover  the  source  message  xn  —  (x(l), . . .  z(n))  as  accurately 
as  possible  in  terms  of  the  mean  squared  error  defined  as 

n 

£n(^X")  =  ±£^(X(t)-X(i))2  (1) 

t=l 

where  Xn  is  the  estimate  of  the  random  message  Xn  made  by 
the  CEO.  Denote  the  CEO’s  estimate  by 

Xn  =  *l(C?,...C2)  (2) 

where  C”  denotes  the  code  word  selected  by  Agent  i  ;  C”  is 
random  because  of  the  joint  randomness  of  the  message  and 
observation  noise. 

We  study  the  tradeoff  between  the  total  rate,  R  = 
and  the  mean  squared  error  Dn(XniXn)  in  the 
following  format.  For  the  given  codes  CJ1,  i  =  1, . . .  L  of  block 
length  7i,  let 

Dn(C r, . . . ,  C2)  =  min  Dn(Xn,  $2(6?, . . . ,  <%))  (3) 

Define 

Dn(L,R)  =  min  Dn(C? .  ,CnL),  (4) 

Y'L  R'KR 

Z— /I=l  *  — 

D(R)  =  lim  lim  Dn(L,R)  (5) 

L— *oo  n— ►  oo 

and 

P(*x,<rN)=  Jim  R~^  ■  (6) 

jR— ►OO  (Tyr 

III.  Main  Result 

Theorem  Let  Q(u\y)  be  any  conditional  density  on  an  arbi¬ 
trary  alphabet  U ,  and  let  Q(u\x)  =  f^W(y\x)Q(u\y)dy.  Then 
under  the  usual  Cramer-Rao  regularity  conditions 


II.  Problem  Statement 

The  formal  description  of  the  Gaussian  CEO  problem  is  as  fol¬ 
lows.  The  CEO  is  interested  in  a  i.i.d  Gaussian  data  sequence 
with  variance  <r2x.  This  data  sequence  cannot  be 
directly  observed  by  the  CEO.  Versions  {FI(f)}  of  {AT(f)}  cor¬ 
rupted  by  independent  additive  white  Gaussian  noise  with 
variance  a2N  are  observed  by  a  team  of  L  agents.  The  agents 
are  not  allowed  to  convene;  Agent  i  has  to  send  data  based 
solely  on  his  own  noisy  observations  The  agents 

are  required  to  send  encoded  versions  of  the  data  observed 
through  noiseless  communication  channels  with  a  total  rate 
R.  Symbolically, 


Yi(t)=X(t)  +  Ni(t) 


P(<rx,<*N)  >  inf 


HY1U\X) 


Also , 


^xE[-^logQ(U\X)] 


>  0 


We  believe  that  the  bounds  are  actually  tight,  but  have 
been  unable  to  establish  this. 
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Abstract  We  consider  the  problem  of  sepa¬ 

rate  coding  for  two  correlated  memoryless  Gaussian 
source.  We  determine  the  rate-distortion  region  in 
a  special  case  that  one  source  plays  a  role  of  partial 
side  information  to  reproduce  sequences  emitted  from 
the  other  source  with  a  prescribed  average  distortion 
level.  We  also  derive  an  explicit  outer  bound  of  the 
rate-distortion  region,  demonstrating  that  the  inner 
bound  obtained  by  Berger  partially  coincides  with  the 
rate-distortion  region. 


I.  Introduction 

Let  A’  and  Y  be  Gaussian  random  variables  with  mean  0 
and  variance  a2x  and  a\ ,  respectively.  We  denote  by  p  the  cor¬ 
relation  coefficient  between  A  and  Y.  We  write  n-independent 
copies  of  A\  1  as  A"  =  A i ,  AT ,••••> Xn  ■>  4 
respectively.  Data  sequences  A"  and  Yn  are  separately  en¬ 
coded  to  (ArT1 )  and  <p2(y  ”)  and  both  are  sent  to  the  infoi- 
mation  processing  center,  where  the  decoder  function  V1  jpk" 
serves  ^  (A"  )  and  <p2  (Yn  )  to  output  the  estimation  ( An ,  Yn ) 
of  (A\Y").  The  encoder  functions  <pt  ( i  =  1,2)  satisfy 
rate  constraints  A  log  ||v?,||  <  &  +  6  (i  =  1,2),  where  6  is 
an  arbitrary  prescribed  positive  number.  For  (A  A  )  - 
(A ”)i<p>(Yn))<  define  the  mean  square  errors  A,*  (i  = 

1,2)  by  A!  -  ^EtB=1(-Vf-A)a,  A2  ELiW-^)2- 

For  given  positive  numbers  Dt  (/  =  1,2),  a  rate  pair  {Ki,  R2) 
is  admissible  if  for  any  6  >  0  and  any  v  >  n0(6)  there  exists  a 
triple  such  that  A;  <  A  +  6  (i  =  1,2).  We  denote 

by  7 Z( Di ,  Do )  the  set  of  all  the  admissible  pair  (i?i ,  R2  )• 

Our  main  goal  is  to  determine  rate-distortion  region 
1Z{Di .  Do).  Berger  [1]  derived  the  inner  bound  of  R{D\ ,  Do). 
However,  the  optimality  was  not  discussed  in  his  paper. 

In  this  paper,  we  determine  the  rate-distortion  region  for  a 
certain  special  case,  and  show  that  the  inner  bound  obtained 
by  Berger  partially  coincides  with  1Z(Di,  Do). 


II.  Statement  of  Main  Results 

If  Do  >  ,  there  is  substantially  no  constraint  between  V  71 

and  Y".  It  means  that  Yn  works  as  an  auxiliary  information 
to  reproduce  A77  with  a  distortion  level  not  greater  than  Du 
and  that  1l(Di,D2)  does  not  depend  on  Do-  We  denote  this 
region  by  ) .  Then  the  following  theorem  holds. 

Theorem  1  :  For  every  D 1  >  0 


Tli(Di) 


R>  >  |  log  [41 

2  \ 

for  some  0  <  s  <  oy 
where  log*  r  =  max  {log  x,  0} . 


Wyner-Ziv  [2]  and  Wyner  [3]  have  determined  the  rate- 
distortion  function  for  the  case  s  — ►  0  that  the  decoder  can 
fully  observe  the  side  information  Yn  -  Theorem  1  is  an  exten¬ 
sion  of  their  results  to  the  case  that  the  decoder  can  observe 
partial  side  information.  For  the  case  that  the  random  pair 
(A,  Y)  takes  finite  values  the  inner  region  of  R\(Di)  was  de¬ 
rived  by  Berger  et  al.  [4],  but  the  determination  problem  of 
'JZi(Di)  still  remains  open  for  this  case. 

Next,  we  derive  an  explicit  outer  bound  of  1l(Di,  D2)-  Let 
IZo(Do)  be  the  rate-distortion  region  for  the  case  D 1  >  (TX- 
We  obtain  the  following  theorem. 


Theorem  2  :  For  every  Du  Do  >  0 


1l(Di,Do)  cn0ut(DuDo), 


where 

'Jl„u,(D1,D2)  =  K1(Di)r\K2(D2)r\iii2(DuD2), 

tz\2(D\,d2)  2  ^ 

=  {(j?l  ,  R2  )  |  Rl  +  ^2  >  2  P  _  P  }  D\D2 

The  inner  bound  1Zin(D^D2)  of  according  to 

Berger  [1]  is 

TZiADi ,  d2)  =  Ki(Di)  n  n2(D2)  n  k12(dud2), 

where 


22 

=  R2)  I  R\  +  R2  >  \  ~~  p2)o  '  D^'dVJ  }’ 

p  =  1  +  \[l 


D 1  Di 


The  boundary  of  TZ.„(DUD2)  consists  of  one  straight  line 
segment  defined  by  the  boundary  of  Hi2(Di,  D2)  and  two 
curved  portions  defined  by  the  boundaries  of  Hi(Di)  and 
Tl2 (D2).  Hence,  the  inner  bound  established  by  Berger  par¬ 
tially  coincides  with  1Z{D\ ,  D2)  at  two  curved  portions  of  its 
boundary.  The  gap  between  inner  and  outer  bounds  is  the  dif¬ 
ference  of  the  rate  sum  given  by  A R  =  \  log  [f  ] .  We  found 
that  A  R  is  negligible  for  relatively  small  values  of  Di  and  D2. 
However,  further  discussions  are  still  necessary  for  resolving 
this  gap. 
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Multilevel  Diversity  Coding  with  Symmetrical  Connectivity 
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I.  Introduction 

Multilevel  diversity  coding  was  recently  introduced  by  Roche 
[l]  and  Yeung  [2].  In  a  Multilevel  Diversity  Coding  System, 
the  decoders  are  partitioned  into  multiple  levels.  The  recon¬ 
structions  of  the  source  by  decoders  within  the  same  level 
are  identical  and  are  subject  to  the  same  distortion  criterion. 
A  comprehensive  discussion  of  multilevel  diversity  coding  is 
found  in  [2].  In  particular,  we  refer  the  readers  to  [2]  for  the 
basic  results  and  the  notion  of  superposition  in  multilevel  di¬ 
versity  coding. 

In  [2],  a  class  of  problems  in  multilevel  diversity  coding 
was  suggested.  In  this  paper  we  consider  one  such  problem 
with  symmetrical  connectivity  between  the  encoders  and  de¬ 
coders  (see  Fig.  1).  In  this  problem,  there  are  three  encoders 
and  seven  decoders.  The  source  {Y/j}  is  an  independent  and 
identically  distributed  (i.i.d.)  process.  The  seven  decoders 
belong  to  three  levels:  Decoders  1,  2,  and  3  belong  to  Level 
1;  Decoders  4,  5,  and  6  belong  to  Level  2;  and  Decoder  7  be¬ 
longs  to  Level  3.  Note  that  each  Level  i  decoder  has  access 
to  i  encoders,  and  {(Y^)*}  is  the  reproduction  of  { Y* }  by 
a  Level  i  decoder.  We  are  interested  in  finding  the  trade-off 
between  the  rates  of  the  encoders  and  the  distortions  of  the 
reconstructions  of  the  source  by  the  decoders. 


{XJ 


Figure  1:  A  symmetrical  multilevel  diversity  coding  sys¬ 
tem 

where  ®  is  defined  by 


II.  The  Main  Result 

Defining  rates  and  distortions  in  the  usual  way,  we  let  R{ 
be  the  rate  of  Encoder  i  and  let  D{  be  the  maximum  allow¬ 
able  distortion  for  each  decoder  in  Level  i,  where  in  general 
each  level  has  its  own  distortion  function.  Say  that  a  sextuple 
(Rx ,  R2 , 1?3 ,  Di ,  D2 ,  D$ )  is  admissible  if  there  exists  a  coding 
scheme  with  the  given  rates  and  expected  distortions  in  the 
usual  Shannon  sense.  Let 


is  admissible  }, 

and  let  R*  be  the  set  consisting  of  all  ( Rt ,  R2: 
satisfying  the  two  conditions  below: 

1)  for  /  ==  1,2,3,  there  exists  Yj  such  that 


2) 


E  di(X,Xi)  <  Di\ 
Ri>I{X  ;Yi)  fori  <i  <3 


z  ®  y 


_  f  z  +  y  ifz  +  2/<3 
\  x  +  y  -  3  if  x  +  y  >  3. 


It  is  shown  in  [3]  that  condition  2)  is  equivalent  to 
2')  For  1,2,3, 


D3) 

Ri=r\+r2i  +rl 

(6) 

where  >  0,  and 

,d3) 

rl  >  I(X\  Xi)  for  1  <  i  <  3 

(7) 

Ti  +  t)  >  I{X ;  X2 |Xr)  for  1  <  i  <  j  <  3 

(8) 

(1) 

’•i3  +  r§+rf  >I{X-,X3\XUX2). 

We  now  state  our  main  result. 

(9) 

(2) 

Theorem  1  72,  is  the  closure  of  con(R*)}  where  con 
notes  the  convex  hull  oflV. 

(IV)  de- 

(3) 

2R{  +  Rim  +  Rie 2  >  4i(Y;  YL)  +  2I(Y;  X2\Xx) 

-f/(Y;Y3 \Xi,X2)  for  1  <  i  <  3  (4) 

Ri+Ri+R3  >  37(X;X1)+ 

_ ___  +I(X’Xs\XuX3),  (5) 
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Proof  The  rate  constraints  in  2')  are  used  for  proving  the 
admissibility  of  72,*,  while  the  rate  constraints  in  2)  are  used 
for  proving  the  converse.  Please  refer  to  [3]  for  the  details  of 
the  proof.  □ 
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I.  Introduction 

Consider  discrete  memoryless  sources  X  and  Y  with  joint  pdf 
Pxy  and  let  d  :  X  x  X  — ►  [0,  oo)  be  a  single-letter  distortion 
measure.  Suppose  the  source  X  must  be  compressed  at  rate  R 
with  distortion  no  larger  than  D ,  when  the  side  information 
Y  is  available  only  to  the  decoder.  This  is  the  Wyner-Ziv 
model  for  lossy  source  coding  with  side  information  at  the 
decoder  [1,  2];  one  method  of  constructing  good  source  codes 
is  described  below. 

Let  U  be  a  r.v.  taking  values  in  U  and  assume  that  UXY 
have  joint  distribution  satisfying  U  — ►  X  — ►  Y.  Define  the 
following  concepts: 

1.  An  ordered  collection  C  of  M  =  en(I(x>u)^ri)  n-tuples 
over  the  alphabet  U  is  called  a  code  book  if  each  n- 
tuple  is  an  element  of  the  ^-typical  set  T\u)s\  let  C  be 
the  collection  of  ail  such  code  books. 

2.  For  R  >  0,  a  map  /  :  {1  , . . . ,  M)  — ►  {l,...,enH}  is 
called  a  binning  scheme;  let  T  be  the  collection  of  all 
such  binning  schemes. 

Given  a  code  (/,  C)  £  T  x  C,  we  consider  standard  joint- 
typicality  encoding  and  joint-typicality  bin  decoding  as  de¬ 
scribed  in  [2].  It  is  well  known  that  by  suitable  choice  of 
U  — ►  X  — ►  Y,  one  can  generate  codes  (/,  C)  £  T  x  C  which 
operate  over  the  entire  region  of  achievable  rate-distortion  tu¬ 
ples.  In  other  words,  the  family  of  coding  strategies  T  x  C 
contains  schemes  which  are  “optimal”,  in  the  rate-distortion 
sense. 

This  leads  to  the  following  question:  for  a  fixed  U  — ►  X  — ► 
Y,  is  it  possible  to  bound  the  error  performance  of  the  codes 
(/,  C)  £  X  x  C?  We  can  place  meaningful  exponential  bounds 
on  the  error  behavior  of  codes  in  the  ensemble,  if  we  use  a 
minimum  entropy  decoder  [3].  This  decoder  selects  from  the 
specified  bin  (say,  bin  k)  any  code  word  Uj  which  minimizes 
the  empirical  conditional  entropy  H(uj\y). 

II.  The  Error  Exponent 

There  are  two  possible  error  events: 

E{f,C)  =  :  3«  (£,«,•)  eT{xU]s, 

3 i  #  » :  /(»)  =  f(j),H(U]\y)  <  H(iii\y)}. 

Event  £'(/,  C)  is  an  encoder  failure  —  x  £  Xn  cannot  be 
encoded  into  any  Ui  £  C.  Event  E(f,  C)  is  a  decoder  failure; 
the  decoder  is,  even  with  its  side  information  y,  unable  to 
extract  the  correct  Ui  from  the  specified  bin. 

Since  Fxy(£'(/,  C))  decays  as  e~nS ,  we  cannot  determine 
a  non-trivial  exponential  bound  on  PjJy(E,(/,  C)).  Our  ap¬ 
proach  therefore  will  be  to  ignore  the  encoder  error  behavior 
entirely,  and  we  shall  only  require  that  a  code  (/,  C)  £  X  x  C 
satisfy  PxyiE'ifi  £))  — * -  0-  In  this  subclass  of  codes,  we  define 
an  error  exponent  based  on  the  decoder  failure  event  E(/,  C ): 

0(R,  UXY)  =  sup  limsup-ilog  Pxy{^U^))' 

(/,C)€^XC  n— ►  0  71 


We  believe  that  this  definition  is  of  significance  because 
the  event  £(/,  C)  arises  in  virtually  every  multiterminal 
source  coding  configuration.  Moreover,  as  recently  shown  by 
Shimokawa,  Han  and  Amari  [4],  a  variation  of  E(f,  C)  takes 
on  particular  importance  in  the  multiterminal  hypothesis  test¬ 
ing  problem.  The  lossy  source  coding  model  of  Wyner  and  Ziv 
provides  a  canonical  setup  for  studying  this  multiterminal  er¬ 
ror  exponent. 

III.  Bounds  on  the  Error  Exponent 
It  is  easy  to  see  that  0(12,  UXY)  =  0  for  R  <  I(X;U\Y) 
and  0(R,UXY)  =  oo  for  R  >  I{X\U).  For  rates  R  £ 
[. I(X ;  U\Y),  /(X;  U)]9  we  define: 

$u(R,  UXY)  £  min  D{UX?\\VXY), 

CrX? 

pXC=pxu 

R<I(X;0\Y) 

and 

0L  (R,  UXY)  = 

min  D(0X?\\UXY)  +  \R  -  /(*;  U)  +  I(Y;  fr)|+. 

OXY 

Theorem  1 

0l(R9  UXY)  <  9(R,  UXY)  <  Su{R,  UXY). 

The  lower  bound  is  proved  by  random  selection  of  codes 
over  the  ensemble  T  X  C,  and  the  upper  bound  follows  by  a 
sphere-packing  argument  [3].  As  a  check,  we  observe  that 
6l(R,UXY)  is  strictly  positive  when  R  >  I(X;U\Y);  ac¬ 
cordingly,  our  result  yields  another  proof  of  the  direct  part 
of  the  Wyner-Ziv  theorem.  We  also  note  that  the  sphere- 
packing  and  random-coding  bounds  need  not  agree,  even  for 
R  «  I(X;U\Y).  We  conjecture  that  the  random-coding 
bound  is  tight  near  the  lower  rate  boundary;  a  more  clever 
application  of  the  sphere-packing  technique  might  close  the 
gap  and  prove  our  conjecture. 
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Abstract  —  We  consider  the  rate-distortion  function 
for  successive  refinement  by  partitioning  and  deter¬ 
mine  error  exponents  for  two-step  coding.  It  is  seen 
that  even  when  the  rate-distortion  functions  for  one- 
and  two-step  coding  coincide,  the  error  exponent  in 
the  former  case  may  exceed  those  in  the  latter. 

I.  Introduction 

Given  a  discrete  memoryless  source  (DMS)  with  probability 
mass  function  (pmf)  P,  and  a  suitable  distortion  measure,  the 
minimum  rate  of  coding  at  distortion  Ai  is  given  by  the  rate 
distortion  function  R(P ,  Ai).  If  a  finer  description  is  required, 
say  with  distortion  A2  <  Ai,  additional  information  can  be 
provided  at  rate  R2  —  R(P,  Aj).  Clearly,  R2  >  F(P,  A2). 
The  minimum  value  of  R2  is  the  two-step  rate-distortion  func¬ 
tion,  R(P,  Ai,A2)  (Rimoldi  [4]).  The  Markov  condition  un¬ 
der  which  F(P,  Ai,  A2)  =  R(P,  A2)  was  determined  indepen¬ 
dently  by  Koshelev  [3]  and  Equitz-Cover  [2]. 

II.  Preliminaries 

Let  A'  be  a  finite  set  and  {Xt}^x  be  a  <Y-valued  DMS  with 
pmf  P.  Let  3d  and  3d  be  finite  reproduction  alphabets;  and 
di  :  XxXi  — }  R+,  i  =  1,2,  nonnegative- valued  mappings  that 
induce  distortion  measures  on  z  =  1,2,  according  to 


where  (/i,<£i)  is  of  rate  Fi  and  distortion  Ai.  Further,  let 


Fi 

Fi 


inf 

Q:  R(Q  ,Ai)>  Ri 
or  H(Q,jR1,A1,A2)>H2 


inf 

Q: R(Q,Ri  ,Ai ,  A 2)>i?2 


D(Q\\P), 

D(Q\\P)- 


Our  main  results  establish  that  there  exists  a  sequence  of  two- 
step  codes  of  rates  (fZi,.R2)  with 


—  log  e,  <  —Fi  +  5i, 
n 

—  log  e2  <  -F2  +  S2, 

n 

for  any  61, 82  >  0.  Further,  for  any  sequence  of  codes  of  rates 
( Ri ,  R2) 


liminf  —  logei  >  —Pi, 

n— >00  71 

liminfiloge2  >  — F2. 

n— >00  71 

Finally,  even  with  the  Markov  condition  [2,  3]  in  effect,  so  that 
J?(P,f?i,Ai,A2)  =  R(P,A2),  A2  <  Ai , 


di(x,y)  =  x  €  Xn,y  6  yj1,  i  =  1,2. 

A  two-step  n-length  block  code  consists  of  two  encoder- 
decoder  pairs  ft  :  X71  ->  Mi  =  {1 ,  : 

n;.i  ->  y i ,  i = i,  2. 

For  given  rate  R\  >  0  and  distortions  Ai  >  A2  >  0  let 
F(P,  Ri,  Ai,  A2)  denote  the  minimum  rate  of  the  two-step 
code  when  the  first-step  code  has  rate  Ri  and  distortion  Ai 
and  the  two-step  code  has  distortion  A2.  It  constitutes  the 
rate-distortion  function  for  the  refining  code  and  follows  im¬ 
mediately  from  [4]  : 

R(P,RUAUA2)=  inf  I(XAYiY2). 
px=p 

Ed!(x,yi)<Ai 

Ed2(x,y2)<A2 

IiXAY^KRr 


III.  Main  Results 

For  convenience,  we  define 


it  is  possible  that  Fi  <  F2.  This  is  illustrated  by  the  simple 
example  of  a  DMS  with  Hamming  distance  distortion  measure, 
where  a  simple  necessary  and  sufficient  condition  for  the  error 
exponents  to  differ  is 

R2  -  R(P,  A2)  >Rx-  R(P,  Ai). 
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Cl  =  Pr  (di(Xn ,<pi(fi(Xn)))  >  Ai 

or  d2(Xn,Mfi(Xn),f2(Xn)))  >  A2) , 
ci  =  Pr(d2(Xn,Mfi(Xn),f2(Xn)))>A2), 
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I.  Direct  Coding  with  High  Resolution 

Let  {(X;,  Yi)}*^  be  a  sequence  of  independent  drawings  of  a 
pair  of  dependent  continuous  random  variables.  It  is  desired 
to  block  encode  X  =  Xi  . . .  Xn  and  Y  =  Yi..^Yn  separately , 
and  decode  them  jointly,  such  that  ~E  £"=1(X;  — X,)2  —  ^ x ’ 
and  ~Yif  <  Dy.  Let  7 l(Dx1Dy)  =  {(Ruto)} 

denote  the  set  of  rate  pairs  of  X-  and  Y-  encoders  which  satisfy 
these  constraints  for  some  n.  Assume  that  the  joint  differential 
entropy  h(X,  Y)  exists  and  is  finite,  and  let  7 l*(Dx,  Dy)  be  the 
set  of  (i?i ,  R2)  pairs  which  satisfy 

Ri  >  h(X\Y)  —  \  log  27reDx 
R2  >  h(Y\X)  -  |  log  2 xeDy  (1) 

Rx  +  R2  >  h(X,Y)  -  |  log  (2  *efDxDy  . 

Note  that  R*(DX)Dy)  has  the  known  “broken  corner”  struc¬ 
ture  of  the  Slepian-Wolf  rate  region. 

Theorem  1  (Shannon  Outer  Bound)  For  any  Dx  and 

Dy, 

n{Dx,Dy)C-R*{Dx,Dy) .  (2) 

Furthermore,  if  E{X2}  <  oo,  E{Y2}  <  oo  and  h(X,Y)  > 
—00,  then,  as  Dx,Dy  — ►  0,  the  outer  bound  (2)  becomes 
achievable,  i.e.;  H(Dx,Dy)  ~  H\Dx,Dy)  ,  where  ~  means 
that  for  any  >  0  and  u2  >  0, 

min  W1R1  -\-w2R2}  ~~  min  {ljiRi  -{-W2R2}  *— ►  0  * 

Ri>R2eK  Ri,R2eR* 

(3) 

This  theorem  has  a  straightforward  extension  to  general 
difference  distortion  measures. 

II.  Remote  Coding  with  High  Resolution 

Now  restrict  our  attention  to  the  Gaussian  case,  but  consider 
the  following  more  general,  indirect  coding  problem  [1,  pp. 
78,  124].  We  need  to  reconstruct  a  (memoryless)  zero  mean 
vector  source  0  =  0i  . . .  0m,  jointly  Gaussian  with  (X,  Y),  from 
separate  encodings  of  X  and  Y,  with  averaged  squared  errors 
Di  ... Dm .  We  refer  to  0  as  the  remote  source,  and  to  X 
and  Y  as  its  noisy  measurements.  Denote  by  R(Di . . .  Dm)  = 
{(RlfR2)}  the  set  of  admissible  rate  pairs  of  the  X-  and  Y- 
encoders. 

When  X  and  Y  are  available  with  infinite  resolution, 
the  optimal  reconstruction  of  9_  is  the  conditional  mean 
E(0\X,Y)  =  H  •  (X, Y)t,  where  H  is  some  m  X  2  ma¬ 
trix.  The  mean  squared  errors  are  then  the  diagonal  el¬ 
ements  D°pt ...  of  the  conditional  covariance  matrix 

COV(£|X,Y).  Clearly  we  can  only  satisfy  distortions  Di  > 
D°pt,  and  usually  when  (Di, . . . ,  Dm)  — ►  (D^pt , . . . ,  D £ff)  the 
coding  rates  must  go  to  infinity.  We  are  interested  in  the 
asymptotic  behavior  in  this  limit.  Let  K  =  {X}  be  the  set  of 
all  2  x  2  nonnegative  definite  matrices  which  satisfy 

(HKH*).  .  <Di  -D°pt  ,  for  i  =  l... m,  (4) 

°This  research  was  supported  in  part  by  the  Wolfson  Research 
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where  H  is  the  matrix  associated  with  the  optimal  estimator 
defined  above,  and  define 

■R.”(D1...Dm)=  (J  K*(Dx,Dy),  (5) 

{DXfDy  :  diag[x>X)Dy]ex:} 

where  the  union  is  taken  over  all  diagonal  matrices  in  K  whose 
diagonal  elements  are  Dx  and  Dy,  and  the  rate  region  R*  was 
defined  in  (1). 

Theorem  2  Assume  that  E{X 2}  <  00,  E{Y2}  <  00  and 
h(X ,  Y)  >  —00,  and  that  H  does  not  have  all-zero  columns. 
Then,  as  (Di, . . . ,  Dm)  -+  (D°pt , . . . ,  D %*),  the  region  of  ad¬ 
missible  rate  pairs  satisfies 

7l(Di...Dm)~ir*(D1...Dm),  (6) 

where  the  notation  ~  was  defined  in  (3). 

III.  The  Loss  due  to  Separate  Encoding 

To  gain  some  insight  into  these  results,  we  examine  below  the 
total  rate  loss  caused  by  the  separation  of  the  encoders  in  the 
high  resolution  limit.  In  the  direct  coding  case  (Section  I)  this 
loss  is  zero,  since  the  rate  in  jointly  encoding  X  and  Y  is  given 
asymptotically  by  the  Shannon  lower  bound  [1,  p.  92]),  which 
coincides  with  the  minimal  rate  sum  of  the  separate  encoders 
given  in  (l)-line  3. 

In  the  indirect  coding  case  (Section  II),  however,  the  loss 
may  be  positive.  Let  JC**  denote  the  subset  of  diagonal  ma¬ 
trices  in  /C,  and  let  det  and  det**  denote  the  maximum  de¬ 
terminants  over  all  matrices  in  K  and  /C**,  respectively.  As 
(D\ .. .  Dm)  — ^  (Dlpt . . .  Dm*),  the  rate  sum  the  X-  and  Y- 
encoders  exceeds  the  rate  of  the  joint  encoder  by 


This  loss  is  due  to  the  fact  that  at  high  resolution  the  quan¬ 
tization  errors  made  by  the  separate  encodings  of  X  and  Y 
are  effectively  uncorrelated,  and  so  we  cannot  take  advantage 
of  the  shaping  gain.  When  the  number  of  remote  sources  is 
smaller  than  the  number  of  measurements  (i.e.,  when  m  —  1), 
the  quantity  (7)  equals  infinity.  In  fact,  the  rate  sum  of  the 
separate  encoders  in  this  case  is  roughly  twice  the  rate  of  the 
joint  encoder  (which  diverges  to  infinity).  This  is  because  the 
separate  encoders  quantize  two  measurements,  while  the  joint 
encoder  effectively  quantizes  one  continuous  random  variable, 
E(6\X,  Y),  at  about  the  same  resolution. 
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I.  Introduction 

Differential  layer  encoding  in  progressive  transmission  refers 
to  the  process  of  generating  additional  bits  which,  in  conjunc¬ 
tion  with  a  low  resolution  version  of  the  image,  enable  the 
decoder  to  reconstruct  the  high  resolution  image.  A  practi¬ 
ced  algorithm  for  the  progressive  transmission  of  black  and 
white  images  is  the  JBIG  algorithm  [1],  which  primarily  uses 
an  arithmetic  coder  for  differential  layer  encoding.  We  con¬ 
sider  differential  layer  encoding  as  an  instance  of  coding  with 
side  information  known  to  both  the  encoder  and  the  decoder. 
Based  on  this  idea,  we  present  a  sliding  window  Lempel-Ziv 
algorithm  for  differential  layer  encoding,  and  apply  it  to  com¬ 
press  black  and  white  images.  The  algorithm  presented  here 
can  also  be  applied  to  other  problems  such  as  successive  re¬ 
finement  of  information  [4] . 

II.  Coding  with  Side  Information 

Consider  a  source  coding  situation  where  both  encoder  and 
decoder  know  a  sequence  X ±  of  letters  drawn  from  a  finite 
alphabet  X.  The  decoder  now  needs  to  transmit  a  sequence 
Yi*  of  letters  drawn  from  an  alphabet  y.  This  is  the  problem 
of  coding  with  side  information  known  to  both  the  encoder 
and  the  decoder.  The  sequence  X ^  is  known  as  the  side  in¬ 
formation.  An  algorithm  for  this  data  compression  problem  is 
as  follows. 

The  Parsing 

1.  Initialization  -  First,  we  fix  a  window  size  nw.  Transmit 
the  first  nw  symbols  of  the  Y  sequence,  Y*w  without 
any  compression. 

2.  Matching  -  Let  L 1  be  the  largest  integer  such  that 

a  copy  of  -1  begins  in  the  current  window 

(If)"1".  Let  the  copy  begin  at  position  start .  Denote 
Yn™+\  as  Y1 ,  the  first  phrase. 

3.  Sliding  -  Define  the  new  window  to  be  (XY)^J^1 . 

Repeat  steps  (2.)  and  (3.)  as  many  times  as  necessary  until 
the  sequence  is  exhausted.  Note  that  this  parsing  is  identical 
to  that  produced  by  applying  the  sliding  window  Lempel-Ziv 
algorithm  [2]  to  the  XY  sequence. 

Representation  of  the  phrases  in  binary 

The  phrase  Y1  =  Yff™  ^  consists  of  two  parts,  the 

matched  portion  Y™"  an<l  the  last  symbol  Ynw  +Li. 

We  thus,  need  to  specify  three  things. 

1.  The  last  symbol,  represented  using  flog (1^1)]  bits. 

2.  The  length  of  the  phrase,  Z1,  which  can  be  represented 
using  flog  Z1]  +  2  flog  log  Z1]  bits  [3]. 

3.  Starting  point  of  the  match  -  Let 

Nxi  =  |{*  s.t.  start  <  k  <  nw  ,  =  X^1Ll-1|} 

Then,  the  starting  point  of  the  match  can  be  specified 
using  flog  Nxi]  -{-2 flog  log  Axi]  bits. 


4.  If  the  number  of  bits  needed  to  represent  the  phrase  us¬ 
ing  (1.),  (2.),  and  (3.)  exceeds  Z1flog(|T|)]  bits,  then 
encode  the  phrase  without  any  compression.  The  num¬ 
ber  of  bits  needed  to  do  this  is  less  than  72 Z1. 

Let  the  total  number  of  bits  needed  to  represent  the  first 
phrase  be  denoted  by  ^(Y1).  Then, 

B{YX)  <  min  {log  Nxl  +  21og  log  Nxi  +71  log  L1,y2L1} 

Lemma  1  Consider  the  sequence  (Ay)^  produced  by  a  sta¬ 
tionary  ergodic  source.  Define  for  l  >  0, 

Wi(xy)  =  min{fc  :  k  >  0,  (xy)1'1  = 

NX(1)  =  |{fc:t<^(^)>Wo_1  =  W:Li-i}l 
Then,  Pr{Nx(l)  >  2,W*'IX)+«:)}  — ►  0  as  Moo 

where  U{Y\X)  is  the  conditional  entropy. 

Based  on  this  lemma,  we  conjecture  that,  if  the  above  al¬ 
gorithm  is  used  to  parse  an  input  of  N  symbols  into  exactly  c 
phrases,  then 

c 

lim  Urn  E( i  B(Yj))  =  H(Y\X) 

»w-»oo  N~*oo  iV  / 

f=l 

We  are  currently  working  to  prove  this  conjecture. 

III.  Application  to  Compression  of  Black  and 
White  Images 

We  start  out  with  a  black  and  white  image  of  size  1728  by  2376 
pixels.  A  resolution  reduction  algorithm  is  applied  to  this  im¬ 
age,  resulting  in  an  image  of  size  864  by  1188.  This  image 
is  scanned  in  a  raster  scan  fashion  to  produce  a  sequence  X. 
Each  pixel  in  X  is  associated  with  four  pixels  in  the  origi¬ 
nal.  The  values  of  these  four  pixels  constitute  the  sequence 
Y .  The  above  algorithm  for  coding  with  side  information  is 
then  applied  to  these  two  sequences.  Applying  this  algorithm 
to  the  CCITT  facsimile  test  documents  resulted  in  an  average 
compression  ratio  of  21  :  1. 
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Abstract  —  A  hierarchical  lossless  source  code  com¬ 
presses  data  by  means  of  a  graph  used  to  represent  the 
data.  We  show  that  the  hierarchical  codes  which  per¬ 
form  best  as  the  number  of  data  samples  grows  have 
a  compression  performance  that  can  be  characterized 
via  a  notion  of  the  dimension  of  the  data  which  we 
call  compression  dimension. 

I.  Data  Representation  via  Graphs 
Throughout  this  summary,  we  fix  a  finite  set  A  as  our  source 
alphabet.  Let  z  be  a  generic  notation  for  a  data  string  that 
can  be  formed  from  the  symbols  in  A,  whose  length  \x  \  satisfies 
2  <  \x\  <  oo.  The  notation  G  shall  be  a  generic  notation  for 
a  finite  directed  acyclic  rooted  graph  whose  edges  are  ordered 
and  whose  terminal  vertices  are  each  labelled  by  a  symbol  from 
A.  Each  graph  G  gives  rise  to  a  data  sequence  x  in  a  natural 
way,  and  we  shall  denote  this  state  of  affairs  by  the  notation 
G  — *  x.  We  indicate  how  this  is  done  with  an  example. 

Example.  We  define  a  graph  G  with  nine  edges  ordered  as 
ei  i  62 , ...  i  eg ,  and  six  vertices  denoted  vo ,  , •  •  • }  vb ,  where  vq 

is  the  root  vertex  and  are  the  terminal  vertices.  Edges 

ei,e2  lead  from  v0  to  v 4;  e3  leads  from  v0  to  vi  \  e4,  e6  lead  from 
vi  to  v2\  ee,  e7  lead  from  v2  to  v*\  and  eg,  e9  lead  from  V3  to  v&. 
Vertices  V4,vb  are  labelled  with  the  symbols  0, 1,  respectively. 
(The  alphabet  A  is  {0, 1}  in  this  example.)  Starting  with 
the  sequence  eie2e3  of  ordered  edges  emanating  from  the  root 
vertex,  we  perform  the  following  steps: 

(1)  eie2e3  — ►  V4V4V1 

(2)  V4V4V1  — ►  v4v4e4eB 

(3)  v4v4e4e&  v4v4v2v2 

(4)  V4V4V2V2  — *  V4V4eee7eee7 

(5)  V4V4eBe7eee7  — ►  v4t/4v3v3v3v3 

(6)  V4V4V3V3V3V3  — ►  ViVie&eQeaegeBege&eQ 

(7)  ViV^eseQeaegeaegeaeQ  — >  v^v^vbVbVbVbVbVbVbVb 

In  the  odd  numbered  steps,  the  sequence  on  the  right  is  ob¬ 
tained  by  replacing  each  edge  in  the  sequence  on  the  left  with 
the  vertex  to  which  that  edge  leads.  In  the  even  numbered 
steps,  the  sequence  on  the  right  is  obtained  by  replacing  each 
non-terminal  vertex  in  the  sequence  on  the  left  with  the  string 
of  ordered  edges  emanating  from  that  vertex.  The  final  se¬ 
quence  on  the  right  in  (7)  consists  entirely  of  terminal  vertices; 
to  obtain  the  data  string  x  such  that  G  — ►  z,  one  replaces  each 
terminal  vertex  in  this  final  sequence  by  the  label  for  that  ver¬ 
tex.  We  see  in  this  case  that  x  =  0011111111. 

II.  Coding  Problem  for  Hierarchical  Codes 
If  G  is  a  graph,  we  let  v(G),e(G)  denote  the  number  of 
vertices  and  the  number  of  edges  in  G,  respectively.  In  the 
following,  all  logarithms  are  to  base  two.  For  the  purposes 
of  this  summary,  we  define  a  lossless  source  code  a  to  be  a 
one-to-one  map  which  assigns  to  each  data  string  x  a  binary 
codeword  a(z).  Informally,  we  want  to  think  of  a  hierarchical 


code  as  a  lossless  source  code  in  which  the  codeword  for  x 
is  generated  incrementally  as  one  traverses  the  edges  of  some 
graph  Gx-+z,  each  edge  (or  group  of  edges)  contributing  at 
least  one  bit  to  the  codeword.  Formally,  we  define  a  lossless 
source  code  ct  to  be  hierarchical  if  there  exist  positive  real 
constants  Ci,C2  such  that  for  each  x, 

Cie(Gx)  <  |a(z)|  <  C2e(Gz)log v(Gx) 

for  some  graph  Gx  — >  x.  For  example,  finite-state  sequential 
codes,  the  Lempel-Ziv  code,  and  bintree  codes  are  hierarchical 
by  this  definition.  We  are  interested  in  the  problem  of  charac¬ 
terizing  those  hierarchical  lossless  source  codes  a  for  which  the 
codeword  length  \a(x)\  grows  most  slowly  as  |®|  — ►  00.  Specif¬ 
ically,  we  characterize  those  hierarchical  lossless  source  codes 
for  which  the  “logarithmic  compression  rate”  log  |a(*)|/log  |z| 
is  minimized  as  |z|  — >  00.  In  the  next  section,  we  introduce 
the  concept  of  compression  dimension  to  solve  our  problem. 

HI.  Solution  via  Compression  Dimension 

We  define  the  compression  dimension  Dim(z)  of  the  data 
string  x  as  the  ratio  log  e(Gaj)/log  |z|,  where  Gx  is  a  graph 
with  the  minimal  number  of  edges  for  which  Gx  — *  x.  If  a  is 
a  hierarchical  lossless  source  code,  define  Dim(z|a)  to  be  the 
ratio  log  |a(z)|/log  |z|. 

Let  S  denote  a  data  set  consisting  of  infinitely  many  data 
strings  x.  We  define  the  compression  dimension  Dim(S)  of 
S  to  be  the  limit  supremum  of  Dim(zc)  as  \x\  — ►  00  through 
members  of  5.  If  a  is  a  hierarchical  lossless  source  code,  define 
Dim(S|a)  to  be  the  limit  supremum  of  Dim(z|a)  as  \x\  — ►  00 
through  members  of  5. 

Theorem  1.  For  any  hierarchical  lossless  source  code  a, 

(i)  lim  inf|a5|_+00  Dim(z|a)/Dim(z)  >  1. 

(ii)  Dim(S|a)  >  Dim(S)  for  any  S. 

Theorem  2.  There  exists  a  hierarchical  lossless  source 
code  a*  such  that 

(i)  lim izj-^oo  Dim(z|a*)/Dim(z)  =  1. 

(ii)  Dim(S|a*)  =  Dim(S)  for  any  5. 

Remarks. 

(1)  Parts  (i)-(ii)  of  Theorem  2  do  not  hold  if  a*  is  the  Lempel- 
Ziv  code. 

(2)  There  are  several  useful  bounds  on  Dim(z),  which  we  shall 
discuss  in  another  work. 

(3)  For  more  on  hierarchical  lossless  source  codes,  see  [1]. 
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I.  Introduction 

Much  research  work  has  been  done  in  finding  sequences  with 
good  autocorrelation  properties.  The  conventional  receiver 
structure  for  synchronizing  with  such  a  sequence  transmit¬ 
ted  periodically  is  a  matched  filter  matched  to  the  sequence. 
However,  synchronization  performance  of  the  receiver  can  be 
further  improved  if  the  receiver  tries  to  match  to  the  sequence 
and  dematch  the  circular  shifts  of  the  sequence.  In  [1],  the 
authors  solve  the  problem  of  finding  the  optimal  receiver  for 
synchronization  in  satellite  systems,  when  the  preamble  is  pre¬ 
ceded  by  random  data.  However,  the  problem  at  hand  is  dif¬ 
ferent  since  the  data  preceding  the  synchronization  sequence  is 
known.  We  derive  the  optimal  linear  receiver  to  synchronize 
with  a  given  sequence  and  demonstrate  the  synchronization 
gain  achieved  by  employing  such  a  receiver  over  the  conven¬ 
tional  receiver.  This  gain  is  obtained  by  simply  changing  the 
receiver  coefficients,  leaving  the  receiver  structure  unchanged. 

II.  The  optimal  linear  receiver 
Let  S  =  {so,  5i,  S2, 5n-i);  Si  €  [—1, 1]  be  a  sequence  and 
Sk  =  {sk,  si+/c,  S2+k,  Sn+K-i}  denote  a  circular  shift  of 
S  by  K.  All  the  additions  in  the  above  equation  axe  mod¬ 
ulo  n.  Let  rss(K)  =  denote  the  autocorrelation  of  S 

and  r™ax  —  max^g^..^.!]  r3S(K)  be  its  maximum  off  peak 
auto  correlation.  Assume  that  the  sequence  S  is  being  trans¬ 
mitted  periodically  and  the  receiver  is  trying  to  synchronize 
with  it.  This  can  be  modeled  by  letting  the  received  sequence 
be  (n;  i  =  -oo,,.,0, ..},  where  rl  =  sz>  +  m  are  samples 
of  white  Gaussian  noise  process  and  i  =  (i  -  k)modulo  n. 
This  problem  occurs  in  the  synchronization  of  spread  spec¬ 
trum  systems  [2]  and  in  CDMA  systems[4].  For  a  sequence 
X  =  {xo,x\,  xn-\}\  Xi  6  IR,  {X,X)  =  1,  consider  the  lin¬ 

ear  receiver;  Lx[k)  =  {Sk_k>,X)  +  Nk.  Substituting  x{  -  -JL 
gives  us  the  conventional  receiver,  which  matches  the  incom¬ 
ing  data  to  the  sequence  S.  A  measure  of  goodness  of  the  code 
5  is  the  difference  rS3(0)  -r™ax  =  y/n  -r™ax.  The  larger  this 
difference,  we  can  see  that  in  an  additive  white  Gaussian  noise 
scenario  the  better  is  the  estimate  of  k  .  Without  loss  of  gen¬ 
erality,  let  rsx(0)  denote  the  maximum  correlation  between 
the  sequence  S  and  X  and  let  r™™  =  maxK-€[i,..,n-i]  rsx{K). 

Now  consider  the  following  optimization  problem; 

Problem  1 

max  {rsx(0)  -  r££x};  ||X||  =  1 

xelR 

=>  min  (S-  Sk,  X)\\\X\\  =  1  (1) 

XeIRn  JC€[l,..,n-l]  w 

Geometrically  speaking,  equation  (1)  finds  that  unit  vector 
X  6  3Rn  which  is  closest  to  the  collection  of  vectors  { S  — 
Sk\K  =  l,..,ra  —  1),  in  the  sense  that  the  minimum  of  the 
projections  of  the  vectors  S  —  Sk  on  X  is  maximized.  We  now 
give  the  solution  to  the  optimization  problem. 

Proposition  1  There  3X  G  Rn,  X  not  necessarily  a  unit 
vector  satisfying  the  following  conditions 


1.  It  is  a  linear  combination  of  the  vectors  {S  -  Sk]  K  = 
1, n  —  1}  that  is;  X  =  Ji  ax(S  —  Sk)- 

2.  The  solution  lies  within  the  cone  of  the  vectors  {5  - 
Sk]  AT  =  l,..,n-l},  implying  that  the  coefficients  0  < 
olk  <  1. 

3.  Let  each  a/c  ^  0  be  denoted  by  a  variable  cq.  The 
collection  {a/;  1  =  1,  ..m},  m  <  n  —  1  thus  denotes  the 
set  of  all  the  non- zero  olk  s.  Let  Si  represent  the  vector 
Sk  corresponding  to  ai.  Then,  the  vectors  {Sr,  l  = 

are  linearly  independent. 

4.  Finally,  let  7  =  mmK=i,..,n-i  (S  -  SK ,  X),  then  V/  (5  - 
Si,X)=  7 

The  vector  Xopt  =  uniquely  solves  the  optimization  prob¬ 
lem. 

III.  Examples 

Consider  the  length  31  Gold  sequences,  there  are  33  in  all 
[3].  Two  of  these  are  the  pseudo  random  sequences.  Let  S 
denote  the  remaining  31  Gold  sequences.  The  \/n(rs5(0)  - 
r™sx)  for  these  sequencs  can  be  seen  [3]  to  be  (31  -  7)  =  24. 
On  the  other  hand,  computing  the  optimal  vector  X  for  these 
sequences,  the  v^(^sx(0)  -  rf%x)  turns  out  to  be  26.3,  26.4, 
26,  27,  27,  27,  27,  25.9,  25.6,  25.6,  26.4,  25.6,  25.7,  26.4,  26.5, 
25.6,  25.7,  25.7,  25.8,  26.4,  25.8,  26.5,  26.9,  25.6,  26.4,  25.7, 
26.5,  25.6,  25.7,  25.7,  26.9.  The  gain  in  terms  of  the  signal  to 
noise  ratio  can  be  seen  to  be  between  20 log  (3M)  =  0.56  dB 
to  20 log  (||)  =  1  dB.  This  gain  is  also  confirmed  by  plotting 
the  probability  of  false  alarm  against  the  signal  to  noise  ratio 
for  the  conventional  and  the  optimal  linear  detectors.  Since 
the  auto  correlation  function  of  the  pseudo  noise  sequences  is 
a  delta  function,  we  do  not  expect  to  get  much  gain  for  pseudo 
noise  sequences. 

We  can  thus  conclude  that  for  sequences  with  large  off  peak 
correlation  values,  like  the  Gold  and  Kasami  sequences,  sub¬ 
stantial  synchronization  gain  can  be  achieved  by  simply  chang¬ 
ing  the  matched  filter  coefficients  of  the  receiver.  This  gain  is 
expected  to  decrease  as  the  length  n  of  the  sequence  employed 
increases.  However,  for  applications  involving  short  length 
pseudo  random  sequences,  the  gain  could  be  significant. 
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All  coherent  communications  systems 
need  a  way  to  regenerate  the  transmitter's 
carrier  at  the  receiver.  Most  current 
communications  systems  do  not  send  short 
messages  and  so  this  regeneration  is 
performed  by  a  phase  locked  loop. (Ref.  6) 
However,  many  communications  systems 
now  being  considered,  e.g.,  networks  of 
small  instruments  on  Mars  or  large 
deployments  of  free  floating  oceanographic 
sensors,  will  transmit  short  messages 
separated  by  long  periods  of  transmitter 
shutdown. 

The  first  section  of  this  talk  presents  a 
maximum  likelihood  algorithm  for 
estimating  the  phase  and  frequency  of  a 
carrier  coming  from  one  of  these  burst 
transmitters,  when  the  carrier  is  observed 
against  a  background  of  white  Gaussian 
noise  and  for  enough  time  for  the  variance 
of  the  maximum  likelihood  estimate  to  be 
low.  The  algorithm  avoids  the  "uncountable 
infinity  of  devices”  which  caused  Reference 
1  to  conclude  the  maximum  likelihood 
algorithm  was  "clearly  unrealizable".  The 
talk  then  analyzes  the  performance  of  this 
algorithm  and,  in  the  process,  not  only 
provides  a  much  more  compact  proof  of 
some  of  the  classic  results  in  Reference  1 
through  4,  but  also  strengthens  them. 

Simulations  of  the  algorithm's  operation 
at  a  moderate  signal  ratio  are  compared  with 
the  high  SNR  bound.  The  talk  then  outlines 
a  similar  algorithm  to  estimate  the  phase  and 
frequency  of  the  decaying  sinusoids 
characteristic  of  physical  measurements. 
Both  algorithms  have  important  roles  in 
making  physical  measurements,  e.g.,  the 
proton's  gyromagnetic  ratio.  (Ref.  5) 

Although  the  algorithms  are  truly 
maximum  likelihood  only  for  asymptotically 
high  signal  to  noise  ratios,  their  performance 
is  imperceptibly  different  from  this  bound  at 
all  SNR's  of  practical  interest,  i.e.,  having  a 


good  algorithm  for  phase  and  frequency 
estimation  is  not  helpful,  if  the  data  is 
insufficient  for  a  fairly  accurate 
measurement  to  be  made.  The  input  to  both 
algorithms  is  a  coarse  frequency  estimate 
and  a  set  of  samples  representing  a  digitized 
segment  of  spectrum.  The  coarse  frequency 
estimate  can  easily  be  generated  by  taking 
an  FFT  of  the  samples  and  determining  the 
largest  bin.  For  convenience  in  the 
derivations,  the  talk  assumes  that  the 
amplitude  of  the  constant  sine  wave  is 
known  and  that  both  the  amplitude  and 
decay  constant  of  the  decaying  sine  wave 
also  are  known.  This  knowledge  is, 
however,  not  necessary  for  either  algorithm. 

The  talk  concludes  with  a  short  section 
contrasting  the  two  maximum  likelihood 
algorithms  with  conventional  frequency 
measurement  techniques.  This  final  section 
demonstrates  that  the  results  in  this  talk  can 
greatly  improve  most  frequency 
measurements  that  are  signal  to  noise  ratio 
limited . 
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Abstract  —  Encoding  and  decoding  schemes,  pre¬ 
sented  in  this  paper,  are  aimed  at  enabling  transfer 
of  data  through  a  channel  in  which  two  types  of  inter¬ 
ference  are  added  to  the  transmitted  signal  and  the 
sum  is  quantized.  One  of  these  interferences  is  known 
(note  that  the  input  of  the  quantizer  is  not  accessible), 
whereas  the  second  is  an  AWGN.  An  upper  bound  on 
the  error  rate,  contributed  by  the  component  codes  of 
a  multilevel  code,  has  been  developed  for  multistage 
decoding. 

Summary 

The  encoding  and  decoding  schemes  presented  in  this  pa¬ 
per  are  aimed  at  enabling  transfer  of  data  through  a  channel 
that  is  different  from  the  conventional  additive  white  Gaussian 
noise  (AWGN)  channel.  Here  we  are  interested  in  a  channel 
where  two  types  of  interference  are  added  to  the  transmitted 
signal  and  the  sum  is  quantized.  One  of  these  interferences 
is  known  (or  can  be  approximated),  denoted  by  a,  whereas 
the  second  is  an  AWGN,  denoted  by  n.  Since  the  input  of 
the  quantizer  is  not  accessible,  the  known  interference  can  not 
be  removed  from  the  received  signal.  We  will  show  that  the 
error  rate  for  an  uncoded  transmission  through  this  channel  is 
unacceptably  large,  even  for  low  noise  levels  and  linear  quanti¬ 
zation.  It  will  be  shown  that  the  problem  becomes  even  more 
severe  when  a  non-linear  quantization  is  present.  Therefore, 
coding  is  essential  and  huge  coding  gain  is  achievable  in  this 
application. 

Investigation  of  the  uncoded  error  events  leads  to  the  con¬ 
clusion  that  a  multilevel  coding  (see  e.g.,  [1])  is  an  efficient  so¬ 
lution.  Note  that  coded-modulation  structures,  including  mul¬ 
tilevel  coding  schemes,  have  been  designed  mostly  for  AWGN 
channels.  The  component  codes  of  a  multilevel  code,  employed 
over  an  AWGN  channel,  should  be  selected  such  that  the  min¬ 
imum  Euclidean  distance  between  the  transmitted  sequences 
would  increase.  However,  this  design  rule  is  not  applicable  for 
our  channel. 

We  derived  a  new  metric,  required  for  maximum  likelihood 
decoding  of  data  received  over  the  foregoing  channel.  How¬ 
ever,  an  important  parameter  of  a  coded  system  is  the  compu¬ 
tational  complexity  of  the  decoder.  The  multistage  decoder 
(see  e.g.,  [1])  is  an  efficient  scheme  for  decoding  multilevel 
codes.  The  decoder  employs  a  separate  binary  decoder  for 
each  component  code.  In  order  to  decode  a  component  code, 
maximum  likelihood  decoding  is  performed  under  the  assump¬ 
tion  that  the  bits  related  to  higher  partition  levels  are  un¬ 
coded,  and  that  the  data  transferred  from  the  decoders  for  the 
codes  related  to  the  lower  partition  levels  are  correct.  How¬ 
ever,  it  can  be  shown  that  the  reduction  in  the  coding  gain, 
due  to  multistage  decoding,  is  very  small,  whereas  the  reduc¬ 
tion  in  complexity  is  substantial. 

Let  Tj  be  one  of  the  quantization  levels.  Let  Si  be  the  i  —  th 
element  in  the  alphabet  of  the  transmitted  symbols.  Let  siQ 
and  stl  be  the  closest  symbols  to  the  value  rj  -  <*,  where  i0 


and  *i  are  even  and  odd  labels,  corresponding  to  the  least 
significant  bit  of  the  symbol  being  0  or  1,  respectively.  Note 
that  if  the  k  —  th  input  to  the  receiver  is  rj,  the  k  —  th  output  of 
the  decoder  for  C\  (the  code  related  to  the  first  partition  level) 
should  be  either  St0  or  Si1 .  The  metric  related  to  the  symbol 
s,  is  log (Pr[rj | Si,  a]),  where  Si  can  be  substituted  by  stQ  or 
Si1 .  The  even/odd  characteristic  of  the  sequence  at  the  output 
of  this  decoder  should  construct  a  codeword  in  C\ ,  which  is 
applied  to  the  least  significant  bits  of  the  symbol  sequence. 
The  decoding  of  the  other  component  codes,  corresponding 
to  other  bits  in  the  symbol  label,  is  performed  in  a  similar 
fashion. 

Let  6ija  be  the  distance  between  the  sum  Si  -f  a  and  the 
threshold  of  the  decision  region  of  rj7  where  a  negative  value 
of  Sija  indicates  that  the  sum  is  outside  the  decision  region. 
The  conditional  probability  of  the  channel  output  is  given  by 

Pr[rj\sx,  a]  =  Qi-Sija/cr), 

where  a 2  is  the  variance  of  n  and  Q(x)  =  --L-  fj°  e~y2^2dy. 
Based  on  the  conditional  probability  it  can  be  shown  that  the 
metrics  corresponding  to  a  maximum  likelihood  decoding  of  a 
component  code  is  a  function  of  <r2.  However,  in  many  cases 
the  level  of  the  noise  is  unknown.  Therefore,  a  metric  for  a 
suboptimal  decoding,  in  which  the  noise  level  is  not  required, 
has  been  derived.  It  can  be  shown  that  for  moderate,  as  well  as 
low,  noise  levels  the  performance  of  the  suboptimal  decoding 
is  very  close  to  that  of  the  maximum  likelihood  decoding.  The 
latter  statement  is  supported  by  computer  simulation. 

An  upper  bound  on  the  error  rate  at  the  output  of  the 
component  code’s  decoder  was  derived.  For  instance,  let  the 
component  code  C\  be  a  convolutional  code.  Let  {ad}  be  the 
set  of  error  coefficients,  used  for  evaluating  the  bit  error  rate 
of  a  convolutional  code  [2].  It  was  proved  that  the  average  bit 
error  rate,  contributed  by  the  decoder  for  Ci,  is  bounded  by 
the  following  upper  bound 

Pb  <  adDd , 

d 

where  D  — 

j  E  y /E  ]Q(^)  £  PrKM^da, 

3  V  *0  *1 

*o  and  i\  are  even  and  odd  labels,  respectively,  and  /a(a)  is 
the  probability  density  function  of  the  interference  a.  Note 
that  D  can  easily  be  evaluated  numerically. 
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Abstract 

Differential  phase  shift  keying  (DPSK)  is  widely  used 
in  communication  systems  where  simplicity  and  robustness 
are  desired.  One  such  system  is  slow  frequency  hopped 
DPSK  (SFH/DPSK)  which  can  sustain  a  much  higher  data 
rate  than  a  fast  frequency  hopped  system  while  having  the 

same  hop  rate.^’  ^ 

In  the  detection  of  SFH/DPSK,  differentially  coherent 
detection  is  often  employed.  This  is  because  it  is  impossible 
to  maintain  the  phase  coherence  between  different  hops. 
Differentially  coherent  detection  can  take  advantage  of 
phase  coherence  within  a  hop  and  thus  outperforms  nonco¬ 
herent  detection.  In  this  paper  we  present  a  study  of  the 
probability  distribution  of  a  received  differential  phase  per¬ 
turbed  by  tone  jamming  and  Gaussian  noise.  The  intent  is 
to  study  the  effects  of  jamming  against  SFH/DPSK  and  to 
provide  some  tools  for  the  analysis  such  a  system. 

In  much  previous  work,  the  performance  of  SFH/DPSK 

has  been  considered^ 1"5^.  Simon^  has  analyzed  the  perfor¬ 
mance  of  SFH/DPSK  under  multiple  continuous  tone  jam¬ 
ming  for  a  specific  set  of  signal  phases  and  equally  spaced 
decision  regions.  The  analytical  results  were  obtained  by 
ignoring  the  system  thermal  noise  so  that  the  derivation 
relied  largely  on  geometric  relation.  Gong  analyzed  the  per¬ 
formance  of  a  specific  binary  SFH/DPSK  scheme  in  both 

tone  and  noise  interference^.  In  [1],  [2],  Wang,  et  al,  pre¬ 
sented  a  method  to  derive  the  general  probability  distribu¬ 
tion  for  arbitrary  DPSK  signals.  In  this  paper,  we  give  an 
alternative  but  simple  expression  of  the  general  probability 
distribution  of  a  received  differential  phase  corrupted  by 
continuous  tone  jamming  and  Gaussian  noise. The  probabil¬ 
ity  distributions  of  the  received  differential  phase  corrupted 
by  either  continuous  tone  jamming  or  Gaussian  noise  is  the 
special  form  of  it.  Thus  this  result  is  a  generalization  of  the 
previous  well  known  results  by  Pawula,  Rice  and  Rob¬ 
erts^. These  results  are  derived  by  making  use  of  an  uncon¬ 
ventional  approach  which  relates  the  desired  probability  to 
a  functional  of  the  joint  characteristic  function  of  narrow- 
band  waveform.  Our  starting  point  is  the  basic  relation 

r  (0)  =  +y2  +  2xycosQ^-^dxdy  (1) 

where  F(0)  is  a  periodic  sawtooth  function  of  period  2 n 
defined  as 

r  (0)  =  0  -7i  <  0  <  re 


Then  the  relation  between  the  joint  characteristic  function 
and  the  probability  P  {i^  <  \|/  <  \|/2}  (see  [5]  for  definition) 

can  be  derived.  The  final  result  can  be  given  by  a  simple 
expression  in  terms  of  Marcum’s  Q-function  as  follows. 

P{\\f{  <vj/<\j/2}=  G  (\j/2)  (2) 

The  auxiliary  function  G  (\\ i)  has  the  form: 


where 

S  =  1  -  cos0cos\{/;  T  =  1  -  cosGcos  (AO  -  \\f) 

q  (a,  b)  =  1  -Q(a,b)  (4) 

To  illustrate  the  application  of  these  results,  we  ana¬ 
lyze  the  error  probability  performance  of  a  general  uncoded 
SFH/DPSK  signal  under  worst  case  tone  jamming  and 
Gaussian  noise.  Skewed  differential  phases  with  unequal 
decision  regions  and  the  error  performance  when  a  fre¬ 
quency  offset  between  the  jamming  tone  and  DPSK  carrier 
is  also  considered. 
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Abstract  —  For  an  on-off  slotted  dynamic  jamming 
game,  a  relation  between  the  steady  state  solutions  and 
cyclotomic  cosets  is  established. 


I.  The  Dynamic  Jamming  Game  Model 

An  on-off  slotted  communication  jamming  game  is  modeled  as 
a  two-person  zero-sum  non-cooperative  dynamic  game  [1] 
played  over  T  uniform  time  slots  between  a  communicator  (a 
transmitter- receiver  pair)  and  a  jammer.  In  slot  the  commu¬ 
nicator’s  power  level  Xt  is  randomly  distributed  over  0,  P,  and 
the  jammer’s  power  level  Yt  over  0,  J  (P  J  >  0).  Each  player 
has  knowledge  of  its  own  and  the  opponent’s  previous  plays. 

1  T 

The  payoff  function  J  is  given  by  J  =  —  E  [f{Xt,  Yj)] , 

t= l 

where  f(Xt ,  Yt)  is  the  normalized  payoff  to  the  communicator. 

Let  Zt  be  a  measure  of  the  communicator’s  past  energy 
accumulation  at  the  beginning  of  slot  i,  and  (0  <  8[  <  1) 
the  communicator’s  thermal  memory  constant.  The  relation 
Zt  =  Xt_i  +  ,  for  t  =  2, . . . ,  T,  with  Zx  =  0,  holds.  There 

is  a  power  constraint  Xt  4-  8\Zt  <  P  for  all  t.  The  jammer  is 
subject  to  similar  constraints  determined  by  analogous  quanti¬ 
ties  Wt,  82  and  J.  The  transmitter  parameters  8U  -p  62  and 
the  payoff  matrix  are  known  to  both  players.  The  strategies 
are  defined  as 

Pt{x\z ,  w)  =  Prob(At  =  x\Zt  —  z,  Wt  =  w)  ,  (la) 

qt{y\z,w)  =  Prob(Ft  =  y\Zt  =  Z,  Wt  =  w)  .  (lb) 

The  optimal  strategies  can  be  found  by  dynamic  program¬ 
ming.  Denoting  SI  (z,  w )  to  be  the  optimum  accumulated  pay¬ 
off  at  time  t  given  the  past  energy  accumulations  z  and  w , 
we  obtain  an  evolution  equation  which  gives  Sl_L(z,  w)  in  terms 
of  S*(z,  w)  for  t  =  T  down  to  2. 


II.  M  x  N  Grid  Solutions 

On  the  (z,  w)  plane  Sj(z,  w)  changes  its  value  at  most  once  along 
the  z-axis  at  z—  and  along  the  w- axis  at  w  —  We  call 

a  critical  point  (c.pt.)  on  the  z-axis ,  and  a  c.pt.  on 

the  w-  axis.  As  time  goes  backward  in  the  evolution  equation, 
the  operating  points  (<5X ,  jj)  and  (82,  7)  for  which  the  c.pts.  of 
S[  (z,  w )  do  not  increase  indefinitely  with  reverse  time  give  rise 
to  steady  state  solutions. 

Consider  the  communicator’s  case.  Let  U(T)  =  J  de- 

note  the  c.pt.  set  at  time  T  along  the  z-axis.  Using  the  power 
constraints,  an  operator  O  which  maps  the  communicator’s  en¬ 
ergy  accumulation  z  at  time  t  to  that  at  t  —  1  is  defined  as 


0(*)  = 


{ 


h 

z-P 

Si 


if  0  <  z  <  P, 
if  P  <  z  <  P. 


(2) 


Then  U{t  -  1)  =  U{T)  |J  0(U(t))  P|[0,  P] .  For  a  steady  state 
solution  with  M  —  1  c.pts.,  we  force  the  condition  that  U(t)  = 
{aL, . . . ,  aM-i}  =  U(t  —  1)  which  gives  the  c.pt.  generation 
system 

®C1  =  O(P) , 

a-cj  =  0(aci_L),  i  =  2,..  .,M—  1.  (3) 

Here  [cl5 . . . ,  c.M- 1],  the  c.pt.  index  vector ,  is  a  permutation  of 
[1, . . . ,  M—  1].  The  operating  condition  is  given  by  >  P, 

aCM~s['  P"  ^  tke  jammer  also  has  N—  1  c.pts.  on  the  w- axis, 
then  SI  (z,  w)  will  have  a  MxN  grid  structure  on  the  (z,  w)  plane 
and  the  game  will  have  a  steady  state  MxN  grid  solution. 
The  number  of  such  solutions  is  related  to  cyclotomic  cosets. 

A  full  cyclotomic  coset  mod  ( 2M  —  1)  can  be  written  as  a 
M-tuple  (i'i,  ... ,  where  vx  <  ■  •  •  <  vM,  or,  alternatively,  as 
(vm,  vcl,  . . . ,  ^CM„L),  where  the  coset  index  vector  [ci, . . . ,  cm~i] 
is  a  permutation  of  [1, . . . ,  M  —  1]  for  which  we  obtain  the  coset 
generation  system 

vcv  =  2i/Mvnod  (2m  -  1)  , 

v ^  =  2vCi_l  mod  (2m  —  1) ,  i  =  2,...,M-l,  (4) 


which  has  the  same  form  as  (3).  Let  an  operator  7m  be  defined 
as 


Tm  (; 


2v 
2v  - 


(2M  -  1)  if  \  <  v  <  (2m  -  2). 


(5) 


Comparing  (3)  with  (4)  we  find  that  the  following  isomorphisms 
hold  for  each  index  vector  [ex, . . . ,  c.m-  1]  ’ 

Tm  ^  *  &  1  (^M)  5  *  *  •  5  )  *  *  (P  aCL , . . . ,  aCAf_i ) . 


Thus  the  number  of  c.pt.  generation  systems  for  any  natu¬ 
ral  number  M  >  2  equals  the  number  h(M)  of  full  cyclotomic 

cosets  mod  (2M  —  1)  given  by  h(M)  =  —  p(d) 2~?  ,  where  g 

d\M 

is  the  Mobius  function  of  number  theory.  The  jammer’s  case 
is  analogous. 

For  given  M  and  A,  there  are  h(M)  c.pt.  generation  systems 
for  the  communicator  and  h(N)  for  the  jammer.  Therefore  the 
game  has  h(M).h(N)  different  MxN  grid  solutions.  Since 
h( 2)  =  1,  there  is  an  unique  2x2  grid  solution  [2]. 
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Abstract  -  A  probability  distribution  function  of  the 
duration  of  a  search  for  a  fixed  pattern  in  random 
data  is  derived,  in  terms  of  bifix  analysis  of  a 
pattern. 

I.  Introduction 


The  expressions  (4)  and  (5)  can  be  easily  proven  using  (2)  and 
(3). 

The  probability  distribution  functions  for  the  biflx-ffee  binary 
pattern  01011  and  "all  zeros"  pattern  of  the  same  length  (for 


According  to  the  classical  well-known  paper  [1],  the  expected 
duration  of  a  search  for  a  fixed  L-ary  pattern  of  length  n  in  a 
sequence  of  random  Lr ary  equiprobable  data  is: 

E{x)  -  Y^h,  -L!  -n  (1) 

/= 0 

where  i\,  i  =  0 represent  bifix  indicator  with  the  following 
meaning:  =  1  if  a  bifix  (a  sequence  that  is  both  prefix  and 

suffix)  of  length  i  exists;  otherwise,  h^-  0  and  by  convention  /1q 
=  =  1.  This  formula  is  unavoidable  for  any  researclh 

considering  synchronization  processes  (e.g.  [2,  3,  4,  5]). 

This  paper  presents  an  extension  to  the  research  given  in  [1],  as 
it  gives  the  formula  for  probability  distribution  function  upon 
which  the  expected  duration  and  variance  of  the  same  process, 
as  well  as  the  higher  moments,  can  be  evaluated. 


II.  Results 

The  probability  that  the  n-digit  pattern  will  occur  for  the  first 
time  at  the  position  within  the  stream  of  random  data 
equals  to: 

Pr{x  =  k}  =  alc  -pk+n~l  (2) 

where  a*,  is  expressed  using  a  recursion: 

ak  =  O) 

/=1 

and  where  p  =  1/L  is  the  probability  of  a  random  equiprobable 
digit. 


Expression  (2)  is  the  probability  distribution  function  so  it 
satisfies  the  condition: 

S{x)  =  X  Pr{x  = »}  =X  a,  •  pi+n~x  =  1.  (4) 

/  =  1  1  =  1 

Variance  of  a  duration  of  a  search  for  the  fixed  pattern  in 
random  data  can  be  found  by  statistical  methods: 

co  n 

(f  =£2  •  =/}  -  M = (£W +«)  •  (£{*}  +«  - 1)  -  2  •  X  -h-i ,  (5) 

/=1  i=0 

00 

while  performing  the  summation  ii{x}  =  -Pr{x  =  /}  , 

/=! 


formula  (1)  is  obtained. 


which  =  1,  f  =  0,...,n)  is  plotted  in  Fig.  1,  dashed  lines 
representing  the  simulation  study,  simulation  being  performed 
over  the  sample  of  100000  searches. 


Fig.  1  Probability  distribution  function  for  5-bit  patterns 


III.  Conclusion 

The  probability  distribution  function  derived  in  this  paper  might 
be  useful  for  all  the  researches  considering  the  search  problem. 
For  some  previous  researches  it  has  been  obtained  either  using 
the  simulation  study  (e.g.  [5]),  or  by  visual  inspection  for  each 
particular  pattern  (e.g.  [6]). 
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Abstract  —  The  Goethals  code  is  a  binary  nonlinear 
code  of  length  2m+1  which  has  22m+1-3m“2  codewords 
and  minimum  Hamming  distance  8  for  any  odd  m  >  3. 
We  construct  new  codes  over  Za  such  that  their  Gray 
maps  lead  to  codes  with  the  same  weight  distribu¬ 
tion  as  the  Goethals  codes  and  the  Delsarte-Goethals 
codes. 


I.  Introduction 

Let  Za  denote  the  integers  modulo  4  and  let  R  be  a  Galois 
ring  of  characteristic  4  with  4m  elements.  The  multiplicative 
group  of  units  in  R  contains  a  unique  cyclic  group  of  order 
2m  —  1.  Let  (3  be  a  generator  for  this  subgroup  and  let  T  = 
{0,l,/3,/?V--,/32m-2}. 

The  Gray  map  0  is  defined  by  <j>( 0)  =  00,  0(1)  =  01,  0(2)  = 
11  and  0(3)  =  10.  Prom  any  (n,  M)  code  over  Za  the  Gray 
map  gives  in  a  natural  way  a  binary  (2 n,  M)  code.  The  Lee 
weight  distribution  of  the  code  over  Za  equals  the  Hamming 
weight  distribution  of  its  binary  Gray  map. 

Let  C\  be  the  binary  code  defined  by  C\  =  0(Ci),  where  C\ 
is  the  linear  code  over  Za  with  parity-check  matrix  given  by 


Hi 


1111--.  1 
0  1  (3  f32  ---  /?2m-2 

0  2  2f33  2 (36  -•-  2  /?3(2Tn“2> 


Hammons,  Kumar,  Sloane,  Calderbank  and  Sole  [1],  have 
shown  that  if  m  is  odd,  then  C\  is  a  nonlinear  binary 
(2m+1,22  -3m-2,8)  code.  This  code  has  the  same  weight 

distribution  as  the  Goethals  code.  They  also  showed  that 
0(CjL)  is  a  Delsarte-Goethals  code. 


II.  Main  Results 

The  main  result  is  to  show  that  we  can  construct  many 
codes  Ck  over  Za  with  the  same  Lee  weight  distribution  as 
Ci .  In  particular,  this  implies  that  the  Hamming  weight  dis¬ 
tribution  of  Ck  =  0(Cfc)  is  the  same  as  for  C\  and  therefore 
identical  to  the  weight  distribution  of  the  Goethals  code.  Prom 
the  MacWilliams  identities  and  from  the  results  of  Hammons, 
Kumar,  Sloane,  Calderbank  and  Sole  [1]  it  follows  that  0(C*-) 
has  the  same  Hamming  weight  distribution  as  the  Delsarte- 
Goethals  code  0(CiL). 

Theorem  Let  m  >  3  be  odd  and  gcd(&,  m)  =  1.  Then  any 
code  Ck  with  parity-check  matrix  Hk  given  by 


Hk  = 


11  1  1 

0  1  (3  (32 

0  2  2 /32k+1  2/3<2'c+1>2 


1 

02rn  —2 
2^(2fc+U(2m~2) 


has  the  same  Lee  weight  distribution  as  Ci  (i.e.,  independent 
of  k). 

lrThis  work  was  supported  in  part  by  the  Norwegian  Research 
Council  under  Grant  Numbers  107542/410  and  107623/420  and  the 
National  Science  Foundation  under  Grant  Number  NCR-9016077 


Sketch  of  proof.  We  give  an  explicit  derivation  for  the 
weight  distribution  of  these  codes  via  exponential  sums.  In  the 
following  we  will  give  a  brief  sketch  of  the  main  ideas  behind 
the  proof.  It  turns  out  to  be  natural  to  study  the  Lee  weight 
distribution  of  Ck. 

The  Lee  weight  of  a  £  Za  and  the  real  part  of  ia  is  related 
by  wl(cl)  —  l  —  3ft(ia),  where  i  =  y/^T.  Hence,  the  Lee  weight 
of  c  =  (co ,  Ci ,  *  •  • ,  cn~i)  G  Za  is  related  to  (the  real  part  of) 
an  exponential  sum  as  follows 


n  —  1 

dz,(c)  =  n-&(^Y‘). 

t= 0 

Let  T  be  the  trace  mapping  from  the  Galois  ring  R  to  Za  .  Let 
c(a,  b)  be  a  vector  of  length  n  =  2m  indexed  by  x  G  T  such 
that 

c(a,b)  =  T(ax  +  2bx2k+1),  aeR,beT. 

Let  1  denote  the  all-one  vector  of  length  2m .  Then 

Ck  —  {^1  +  c(a, b)  |  a£R,b£T,u£  Za}. 

Let  u  £  Za,  a  G  R  and  b  G  T  and  define 

S(a,b,u)  =  iuY,inax+2bx2k+1)- 

x€T 

The  main  part  of  the  proof  is  to  determine  the  values  and 
the  number  of  occurrences  of  each  value  for  the  real  part  of 
this  exponential  sum.  It  turns  out  that  this  distribution  is  in¬ 
dependent  of  k  when  gcd(fc,  m)  =  1.  Hence  from  the  relation 
between  the  exponential  sum  and  the  Lee  weight  of  the  code¬ 
words  in  Ck  ,  we  conclude  that  the  Lee  weight  distribution  is 
independent  of  k  and  coincides  with  the  distribution  for  the 
Delsarte-Goethals  codes  that  can  be  found  in  Chapter  15  in 
MacWilliams  and  Sloane  [3]. 

In  Kumar,  Helleseth,  Calderbank  and  Hammons  [2]  large 
families  of  quaternary  sequences  with  good  correlation  prop¬ 
erties  were  constructed  from  the  codes  Cj~.  We  can  also  con¬ 
struct  families  of  quaternary  sequences  with  similar  properties 
from  Ck. 

References 

[1]  R.  Hammons,  P.V.  Kumar,  N.J.A.  Sloane,  R.  Calderbank  and 
P.Sole,  ”The  ^-Linearity  of  Kerdock,  Preparata,  Goethals,  and 
Related  Codes,”  IEEE  Trans .  on  Inform.  Theory ,  vol.  40,  pp. 
301-319,  1994. 

[2]  P.V.  Kumar,  T.  Helleseth,  R.  Calderbank  and  R.  Hammons, 
” Large  Families  of  Quaternary  Sequences  with  Low  Correla¬ 
tion,”  to  appear  in  the  IEEE  Trans,  on  Inform.  Theory. 

[3]  F.J.  MacWilliam  and  N.J.A.  Sloane,  The  Theory  of  Error  Cor¬ 
recting  Codes,  (North-Holland ,  New  York,  1977). 


274 


A  Cyclic  [6,3,4]  Group  Code  and  the  Hexacode  Over  GF(4) 

Moshe  Ran  and  Jakov  Snyders1 

Dept,  of  Electical  Engineering-Systems,  Tel  Aviv  University,  Tel  Aviv  69978,  Israel. 


Abstract  —  A  [6,3,4]  code  He  over  an  Abelian  group 
A4  with  four  elements  is  presented.  H6  is  cyclic,  unlike 
the  [6,3,4]  hexacode  E6  over  GF(4).  However,  He  and 
Ee  are  isomorphic  when  the  latter  is  viewed  as  a  group 
code.  He  is  the  smallest  member  of  a  class  of  [2fc,fc,4] 
cyclic  and  reversible  codes  over  A 4. 

I.  Summary 

A  group  code  C  of  length  n  over  an  Abelian  group  A  is 
a  subgroup  of  An,  the  n-fold  direct  product  of  A.  The  rate 
k(C)  is  defined  by  k(C)  =  log j  |C|,  where  \X\  stands  for 
cardinality.  A  group  code  C  of  length  n  with  rate  k  and 
minimum  Hamming  distance  dn  is  called  an  [n,k,dn]  code. 
A  linear  code  C  over  a  field  F  is  also  a  group  code  over  the 
additive  group  of  F.  It  has  been  shown  in  [2]  that  many  of  the 
important  structural  properties  of  codes  over  F  are  associated 
with  the  additive  and  not  the  multiplicative  group  properties 
of  F. 

We  present  a  [6,3,4]  group  code  He  over  A 4  with  |^4 1  =  4. 
Let  A4  =  { a ,  6,  c,  d]  be  the  additive  group  of  GF(4),  where  a  is 
the  identity  element.  The  elements  of  A 4,  are  called  symbols. 
For  the  purpose  of  describing  some  binary  codes  with  the  aid 
of  Ee,  we  use  various  binary  representations  for  symbols,  e.g., 
a  =  0000,  b  =  0101,  c  =  0011,  d.  =  0110. 

Let  He  be  the  code  that  comprises  the  (symbolwise)  cyclic 
shifts,  and  their  sums,  of  ( cabbba ).  He  is  obviously  a  [6,3,4] 
group  code,  hence  it  is  an  MDS  code.  Consequently,  every 
three  coordinates  in  He  constitute  an  information  set,  whereby 
every  three  symbols  occur  exactly  once  (in  any  three  fixed 
positions),  every  two  symbols  occur  4  times  and  every  symbol 
42  =  16  times.  He  is  the  smallest  member  of  a  class  of 
[2k,  k A]  cyclic  and  reversible  codes  over  A\. 

There  is  a  unique  formally  self  dual  [6,3,4]  code  over  GF(4) 
(see  [1,  pp.  301-303]),  called  hexacode  and  denoted  j E&.  No 
version  of  Ee  maps  onto  He  under  any  bijection  / :  GF(4)  »-+ 
A4.  In  fact,  no  cyclic  [6,3,4]  code  over  GF(4)  exists.  Nonethe¬ 
less,  if  Ee  is  viewed  as  a  group  code  then  it  is  isomorphic  to 
H6.  Since  (aaaaaa)  e  He,  He  and  Ee  have  identical  coset 
Hamming  weight  distributions. 

Some  properties  of  He  are  the  following. 

1)  He  is  invariant  under  replacement  of  a  by  d  and  b  by  c. 

2)  He  is  invariant  under  cyclic  permutation. 

3)  He  is  invariant  under  reversal  of  the  symbols. 

4)  He  is  representable  by  a  4-section  16-states  non-symmetric 

trellis  diagram,  and  also  by  a  3-section  16-states  sym¬ 
metric  trellis  diagram. 

1This  work  was  supported  in  part  by  the  Israel  Science  Foun¬ 
dation  administrated  by  the  Israel  Academy  of  Science  and 
Humanities. 


5)  Let 

Co  =  {aaaaaa,  cccdbd,  abccba,  cdabad} 

Cq  =  {aaaaaa,  bccbaa,  ddbcaa,  cbddaa] 

Cq  =  { aaaaaa ,  aabccb ,  aacbdd ,  aaddbc] 

and  Ci  =  Cq  -f  Cq.  Then  Ci  is  [6,2,4]  code.  We  have 
He  =  Co  -j-  Ci  and,  using  standard  notations  for  group 
partitioning, 

He  =  [C2/C1]  +  [Ci/CS]  +  CS  =  Co  +  C'q  +  Cd. 

6)  He  consists  of  the  blocks  of  3  -  (24, 6,1)  constrained  design 

(see  [3]).  He  can  also  be  represented  as  the  union  of  four 
2  —  (12,  3, 1)  constrained  designs.  (A  constrained  design 
may  exit  for  parameters  values  for  which  no  conven¬ 
tional  t-design  exits.  In  particular,  neither  a  3-(24,6,l) 
nor  a  2-(12,3,l)  t-design  exits.) 

For  Ee  representations  similar  to  those  of  4)  -  6)  apply. 
Also,  there  exists  a  self-dual  version  of  Ee,  for  which  a  prop¬ 
erty  similar  to  1)  holds.  However,  properties  2)  and  3)  are 
unshared  by  (all  versions  of)  Ee. 

We  present  several  constructions  for  binary  codes  of  length 
24  derived  from  He.  In  particular,  the  MOG  (Miracle  Octad 
Generator)  construction,  by  which  the  [24,12,8]  Golay  code¬ 
words  are  described  as  some  set  of  binary  images  of  Ee,  applies 
also  with  Ee  replaced  by  He. 

An  approach  to  fast  maximum  likelihood  decoding  of  some 
binary  codes  of  length  24  may  be  based  on  He.  The  binary 
codewords  are  regarded  as  images  of  He •  Let  z  €  An  be 
the  vector  obtained  by  symbol-by-symbol  soft  decoding.  The 
neighborhood  of  z  is  examined  in  order  to  identify  the  most 
likely  codeword.  A  small  list  of  candidate  codewords  is  pre¬ 
pared  by  employing  certain  elimination  rules.  A  substantial 
reduction  of  computational  complexity  is  achieved  in  max¬ 
imum  likelihood  soft  decoding  by  intensively  exploiting  the 
structure  of  He  in  the  decoding  procedures. 

References 

[1]  J.H.  Conway,  and  N.J.A.  Sloane,  Sphere  Packing  Lattices  and 
Groups ,  New  York:  Springer- Verlag,  1988. 

[2]  G.D.  Forney  Jr.  and  M.D.  Trott,  “The  dynamics  of  group  codes: 
state  space,  trellis  diagram,  and  the  canonical  encoders,”  IEEE 
Trans.  Inform.  Theory ,  vol.  IT— 39,  pp.  1491—1513,  Sept.  1993. 

[3]  M.  Ran  and  J.  Snyders,  ’’Constrained  designs  for  maximum 
likelihood  soft  decoding  of  RM(2,m)  and  the  extended  Golay 
codes,”  IEEE  Trans.  Communications ,  to  be  published. 


275 


Decoding  Binary  Expansions  of  Low-Rate  Reed-Solomon  Codes 

Far  Beyond  the  BCH  Bound 

Charles  T.  Retter 

U-S.  Army  Research  Laboratory,  AMSRL-IS-TP,  Aberdeen  Proving  Ground,  MD  21005 


Abstract  —  Binary  expansions  of  low-rate  Reed- 
Solomon  codes  typically  are  capable  of  correcting 
far  more  binary  errors  than  guaranteed  by  the  BCH 
bound  on  the  Reed-Solomon  code.  Practical  decoding 
algorithms  that  often  correct  beyond  the  true  mini¬ 
mum  distances  of  the  binary  codes  are  described. 

L  Minimum  Distances 

The  minimum  distance  of  an  (N,K)  Reed-Solomon  code  is 
given  exactly  by  the  BCH  bound  (N+l-K).  Since  low-rate 
Reed-Solomon  codes  have  no  binary  subfield-subcodes,  pro¬ 
vided  that  1  is  a  root  of  the  generator  polynomial,  the  binary 
expansions  of  many  of  these  codes  have  surprisingly  high  min¬ 
imum  distances.  To  explore  the  properties  of  these  codes, 
a  large  number  of  weight  distributions  were  computed  by 
generating  codewords  on  a  KSRl  supercomputer.  All  Reed- 
Solomon  codes  with  parameters  (31,7),  (63,6),  (127,5),  and 

(255.4)  were  expanded  using  all  normal  bases.  Then  the  most 
promising  codes  with  parameters  (31,8),  (63,7),  (127,6),  and 

(255.5)  were  examined.  A  total  of  3064  codes  containing  al¬ 
most  50  trillion  codewords  were  generated. 


RS 

Binary 

Worst 
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Best 

BCH 

Codes 

Codes 

dmin 

dmin 

dmin 

Bound 

(31,7) 

(155,35) 

40 

40.944 

44 

25 

(63,6) 

(378,36) 

84 

123.690 
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58 
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320 

359.405 
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863.402 
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32 

32.000 

32 

24 
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128.000 
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57 

(127,6) 

(889,42) 

352 

352.000 

352 
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(255,5) 

(2040,40) 

884 

884.000 

884 

251 

II.  Decoding 

A  conventional  decoder  for  these  codes  would  map  binary 
m-tuples  into  symbols  in  GF(2m)  and  decode  using  one 
of  the  standard  Reed-Solomon  decoding  algorithms.  This 
approach  will  decode  correctly  only  if  the  number  of  sym¬ 
bol  errors  is  less  than  (N+l-K)/2.  Although  not  every  bit 
error  becomes  a  symbol  error,  this  approach  cannot  take 
advantage  of  the  true  capabilities  of  these  binary  codes. 

At  AAECC-3,  Bossert  and  Hergert  [1,  2]  suggested 
a  simple  approach  to  decoding  linear  codes,  based  on  a 
very  large  syndrome  formed  by  using  all  of  the  minimum- 
weight  codewords  in  the  dual  as  parity  checks.  The  ob¬ 
servation  that  the  weight  of  this  large  syndrome  increases 
with  the  number  of  errors  suggests  various  simple  algo¬ 
rithms  to  search  for  the  nearest  codeword  by  reducing  the 
weight  of  the  syndrome.  The  most  important  requirement 
for  this  algorithm  is  that  the  dual  must  contain  a  large 
number  of  codewords  with  very  low  weights,  which  is  the 
case  for  the  codes  described  above.  For  example,  the 
weight  distributions  of  the  duals  of  2304  binary  (2040,32) 


codes  were  computed,  and  99%  of  the  duals  were  found 
to  have  =  5,  with  one  code  having  1142808  words 
of  that  weight.  However,  a  direct  implementation  of  the 
Bossert-Hergert  algorithm  tends  to  stop  at  local  minima 
when  all  of  the  minimum- weight  words  are  used  as  checks. 
By  varying  the  set  of  checks  on  successive  passes,  the  local 
minima  can  be  avoided.  Simulations  of  several  variations 
of  this  modified  algorithm  showed  that  far  more  errors 
can  be  corrected  than  with  conventional  Reed-Solomon 
decoders.  The  figure  below  shows  the  failure  rates  of  five 
decoders  for  the  code  described  above  ( dmin  =  768). 


Number  of  Errors  in  a  Block  of  2040  Bits 


A  A  conventional  Reed-Solomon  decoder,  which  fails 
50%  of  the  time  with  165  bit  errors. 

B  A  hypothetical  binary  bounded-distance  decoder, 
which  fails  with  384  or  more  errors. 

C  A  four-pass  threshold  decoder,  which  starts  by  us¬ 
ing  all  weight-5  checks,  but  changes  the  set  of  checks 
to  avoid  local  minima.  Its  50%  failure  rate  occurs 
with  730  errors.  Since  the  code  is  quasi-cyclic,  a 
hardware  implementation  of  this  decoder  is  reason¬ 
ably  simple  and  very  fast. 

D  A  similar  decoder,  which  changes  only  the  best  bit 
on  each  of  1500  passes.  The  50%  failure  rate  for 
this  decoder  is  reached  at  763  errors. 

E  A  maximum-likelihood  decoder,  which  has  a  50% 
failure  rate  at  878  errors. 
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Abstract  —  We  propose  an  algorithm  for  linear  feed¬ 
back  shift-register  (LFSR)  synthesis  in  the  case  of 
multiple  sequences  belonging  to  a  commutative  ring 
with  identity.  It  is  also  shown  how  this  algorithm 
can  be  applied  to  the  decoding  of  cyclic  codes  defined 
over  an  integer  residue  ring  Zg,  where  q  is  a  power  of 
a  prime. 

I.  Introduction 

It  is  well  known  the  practical  and  theoretical  importance  of 
cyclic  codes  defined  over  a  finite  field  Fq.  Recently,  cyclic 
codes  over  integer  rings  Z  m  have  also  been  receiving  special 
attention;  reasons  for  that  are,  for  instance,  i)  the  mapping 
of  cyclic  codes  over  Z4  into  nonlinear  binary  codes  with  ex¬ 
cellent  error-correcting  capabilities,  and  ii)  the  matching  of 
these  codes  to  MPSK  modulation  schemes.  In  this  paper 
we  extend  a  set  of  results  of  the  paper  by  Feng  and  Tzeng 
[1],  culminating  in  a  decoding  procedure  for  cyclic  codes  over 
integer  rings  Z9,  with  q  a  power  of  a  prime.  We  shall  be 
considering  only  those  cyclic  codes  over  Zg  whose  generator 
polynomials  divide  xn  —  1,  where  n  denotes  the  code  length. 
Let  f3  6  GR(q,r)  (the  r-dimensional  Galois  extension  ring  of 
Zg)  denote  a  primitive  root  of  xn  —  1.  Suppose  further  that 
^b+tCl  +  hC2  are  roots  0f  generator  polynomial  g(x)  of  a 
cyclic  code  C  over  Zg,  for  i  —  1,  2, ...,  do  —  1,  h  =  1,  2, s  + 1, 
where  gcd(ci,n)  =  gcd(c2,n)  =  1.  Then,  dmin{C)  >  do  +  s 
(Hartmann-Tzeng  (HT)  bound  for  cyclic  codes  over  Zg). 

II.  Modified  Fundamental  Iterative  Algorithm 
Let  R  be  a  commutative  ring  with  identity  (CRI).  Given  an 
M  xN  matrix  A  with  entries  in  R,  and  with  rank  less  than  A, 
find  the  smallest  t  such  that  the  (€+  l)-th  column  in  A  can  be 
expressed  as  a  linear  combination  of  the  previous  i  columns. 
The  solution  to  this  problem,  when  the  entries  of  A  lie  in  a 
field  F  is  given  by  the  Fundamental  Rerative  Algorithm  (FIA), 
as  proposed  in  [1].  Henceforth,  we  follow  the  notation  of  [1]. 
By  extending  Lemma  1  [1]  to  the  ring  case,  we  have  devised  a 
Modified  FIA ,  which  is  similar  to  the  original  FIA,  except  for 
step  4),  namely, 

4)  if  dr,s  #  0,  then, 

•  a)  if  there  exists  a  dr>u  €  D,  for  some  1  <  u  <  s  and  a  y 
(over  R)  satisfying  dr,s  -y-dr)U  =  0,  then  C'(r_1,3)(z)  <— 
C(r_ 1,s)(x)  —  y  •  C^(z)  *  xs~u,  and  return  to  3a); 

•  b)  if  either  there  is  no  such  a  dr,u  E  D ,  for  some  1  < 

u  <  s  or  if  dr,s  ~~  y  •  dr,u  =  0  does  not  have  a  solution 
in  y  (over  R ),  then:  i)  if  column  5  is  LI  on  the  previous 
$  —  1  columns  (up  to  row  r),  then  dr>$  is  stored  in  Table 
D,  C(s)(x)  «-  C(r_I's)(x),  C(0's+1)(x)  «-  C(s)(x),  s  «- 
5 T 1 5  r  * —  1 ,  and  return  to  2)  else,  ii)  if  ah,s  +  + 

...  +  =  0,  1  <  h  <  r,  for  some  choice  of 

coefficients  on,  then  <—  1  + aqz  +  . .  .+as- ixs  \ 

and  return  to  2). 

1This  work  has  been  supported  by  CNPq,  under  grant 
301416/85-0,  and  FAPESP,  under  grant  92/4845-7,  Brazil. 


Theorem  1  The  final  s  and  C^T  l,s\x)  obtained  from  the 
Modified  FIA  is  the  solution  to  the  problem  with  minimum  s. 

III.  Extended  Berlekamp-Massey  Algorithm 
Given  t  sequences  over  a  CRI  R,  find  a  shortest  LFSR  that 
is  capable  of  generating  them,  i.e.,  solve  the  linear  system  of 
equations  (1)  (in  [1])  over  R.  This  is  equivalent  to  finding  the 
minimum  £  such  that  the  (£+  l)-th  column  in  matrix  5,  as  in 
[1],  can  be  expressed  as  a  linear  combination  of  the  previous 
£  columns.  Here,  our  main  result  was  to  extend  Theorems  2 
and  3,  from  [1],  to  the  case  when  the  sequences  lie  in  a  CRI, 
and  incorporate  it  in  the  Generalized  Berlekamp-Massey  (BM) 
Algorithm  for  Multiple  Sequences .  The  obtained  algorithm  is 
similar  to  the  original  one,  except  for  Steps  2)  and  3).  We  de¬ 
scribe  Step  2  a)  and  b)  below,  which  refer  to  the  computation 
of  (x)  from  <T^n,t\x)  when  d ^  ^  0;  Step  3)  works  in 

a  similar  way.  For  more  details,  see  [2]. 

2b)  if  dn^  ^  0  then  find  an  mt  such  that  the  equation 
dn  ^  —  y  •  d^m]  =  0  Fas  a  solution  in  y  (over  R).  Then, 
=  a^\x)  -  y  •  x )  •  and 

4+1  =  max  {ln  \  n  —  mt  +  /mt  }; 

2c)  if  4+i  —  max{/n\  n  +  1  -  4^}  then  go  to  3);  else 
search  for  a  solution  £)(n+1,0(x)  with  minimum  pos¬ 
sible  degree  /  in  the  range  ma x{ln\n  +  1  -  <  i  < 

max  +n  —  mt]  such  that  the  polynomial  de¬ 
fined  by  2>C»+i.0(x)  _  *<».*>(*)  =  £n“m*  •  is 

a  solution  for  the  first  mt  power  sums,  dm]  =  —dn\ 
and  <r{0mt)  is  a  zero  divisor  in  R.  If  such  a  polynomial  is 
found,  then  cr^n+1,t^(x)  <—  D^n+1,t\x);  and  4+i  h 

IV.  Decoding  of  Cyclic  Codes  over  7Lq 

The  error-location  numbers  are  calculated  by  solving  the  lin¬ 
ear  system  of  equations  (20)  (in  [1])  with  minimum  possi¬ 
ble  i/,  via  the  BM  Algorithm  for  multisequences  over  a  CRI . 
In  general,  one  has  more  than  one  minimal  solution  satisfy¬ 
ing  equations  (20)  (in  [1]).  However,  we  have  shown  that 
when  p(z)  =  zv a(z~l)  has  v  or  more  roots  Z{  (note  that 
p(z)  is  a  polynomial  with  coefficients  in  GR(qy  r)),  then  these 
roots  are  related  to  the  correct  error  location  numbers  by 
Zi  —  aA1  =  zero  divisor  in  GR(qyr),  for  1  <  i  <  v,  and  are 
uniquely  determined.  The  error  magnitudes  are  still  computed 
using  Forney’s  procedure  with  minor  changes. 
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Abstract  —  We  present  new  bounds  for  the  mini¬ 
mal  length  starting  from  which  BCH  codes  of  given 
minimal  distance  2t  +  1  have  covering  radius  at  most 

2 t. 


I.  Introduction 

A  linear  [n,  k,  d)  code  C  is  said  to  be  maximal  if,  for  all  linear 
[n,  k  4- 1]  codes  C\ 

d{C")  <  d, 

where  d(C')  is  the  minimum  distance  of  C' .  In  other  words, 
one  cannot  add  a  coset  to  C  without  decreasing  its  minimum 
distance. 

The  problem  of  determining  the  length  starting  from  which 
the  t-error  correcting  BCH  code  is  maximal  amounts  to  the 
one  of  finding  the  smallest  length  for  which  its  covering  radius 
is  strictly  less  than  2t  +  1. 

The  first  result  of  this  kind  was  derived  by  Tietavainen 
[2],  following  a  paper  of  Helleseth  [1].  His  bound  guarantees 
maximality  for  the  t-error  correcting  BCH  code  of  length  n  — 

,  provided  2m  >  ((2t  —  l)iV)4t+2.  Here  we  sharpen  this 
bound. 

For  asymptotic  results  on  covering  radius  of  BCH  codes, 
see  also  the  paper  of  Skorobogatov  and  Vladuts  [3]. 

II.  The  results 

Theorem  1  The  t-error  correcting  BCH-code  of  length  2m  —  l 
over  f2  is  maximal  provided 

2m  >  4(1  +  e(t))(t  —  1)2(<!)2, 

where  eft)  is  a  decreasing  function  oft,  e(4)  <  0.581,  e(5)  < 
0.138,  and  e(t)  <  (t_1^,_1)  for  t  >  5. 


Consider  the 

system 

r 

4*  ... 

+ 

xi 

=  biyN 

x\N 

+  ... 

+ 

3N 

xi 

=  hy3N 

(2t-l)N 
x 1 

+  ... 

+ 

x{2t~l)N 

=  bty(2t~1)N 

Let  A fi  be  the  number  of  solutions  (x,,  ...,xi,y)  €  (Fam  )i+1 
of  system  (1),  with  Xj  /  xk  for  j  /  k.  If,  for  aU  (fo , . . . ,  bt)  £ 

F2"  \  {0},  there  exists  (at  least  one)  *,  1  <  *  <  2i,  such  that 
Mi  /  0,  then  the  covering  radius  of  BCH(2t  +  1)  is  less  than 
or  equal  to  2 t. 

To  prove  the  maximality  of  BCH(2t  +  1)  it  is  sufficient  to 
prove  that,  starting  from  a  suitable  length,  its  covering  radius 
is  less  than  or  equal  to  2 1.  We  are  done  if  we  can  prove  that, 
for  m  large  enough,  the  sum 

t 

aiAfi  ^  0, 

i=0 

for  some  (2 1  +  l)-tuple  (a0, . . . ,  a2t). 

Choosing  the  (2<  +  l)-tuple  (a0,...,a2i)  to  be  the  coeffi¬ 
cients  of  the  expansion  of  a  properly  chosen  polynomial  of 
degree  2 1  in  the  basis  of  Krawtchouk  polynomials,  we  obtain 
the  aforementioned  result. 
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Theorem  2  The  t-error  correcting  BCH-code  of  length  n  — 
— ^ —  over  F 2  is  maximal  provided 

2m  >  (1  +  eN(t))((2t  ~  1)N  -  1  )2(t!)2, 

where  £jv(t)  is  a  decreasing  function  of  t  satisfying,  for 
N  >  2,  6n( 4)  <  0.347,  £N(5)  <  0.008,  and  eN(t)  < 
((2t-i)iv_i)2(t_1)2(i-2)  for  t  >  5. 


III.  Sketch  of  the  proof 

Let  BCH{2t  +  1)  stand  for  the  <-error  correcting  BCH  code 
of  length  n  =  (2m  -  1  )/N. 
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Abstract  —  Primitive  binary  BCH-codes  were  sup¬ 
posed  to  have  binomial  weight  distributions  for  all 
code  rates  R  <  1  when  N  — ►  oo.  Here  it  is  shown  that 
this  is  only  true  if  7^  — ►  1. 

I.  Introduction 

The  first  bound  on  the  differences  between  weight  distribu¬ 
tions  and  the  binomial  distribution  was  given  in  [1]  for  primi¬ 
tive  binary  BCH-codes  of  length  N  =  2m  — 1  and  dmtn  =  2t+l. 
There  it  was  shown  that  for 

0<t<0A-y/N  (1) 


III.  The  Contradiction 
For  binary  codes  with  R  — ►  1  we  obviously  have 

N  — >  oo  and  R  — >  1  =>>  E(R)  =  0  .  (3) 

Another  result  from  [2]  yields  that  for  a  fixed  rate  code  se¬ 
quence  of  linear  block  codes  with  vanishing  normalized  min¬ 
imum  distance  the  error  exponent  cannot  be  different  from 
zero: 

lim  —  0  =*•  E(R)  =  0  ,  0<R<1  .  (4) 

AT— *00  N 

Using  the  result  from  [4],  one  obtains  for  primitive  binary 
BCH-codes  (p.  b.  BCH)  that  their  normalized  minimum  dis¬ 
tances  vanish  for  N  — ►  oo.  Thus,  we  arrive  at  a  contradiction: 
From  the  conclusion  in  Section  II  we  have 


and  any  weight  (distance)  d#  satisfying  the  inequalities  2 1  + 
i/(t)  <  dn  <  N  —  2t  +  i/(t)  with 


/  x  rati 
"(<)  =  |“o: 


2<ln  (  +  4.5  t  +  0.1  In  N 


.5  In  N  -ln<  -  2.25 
the  number  of  codewords  AdH  of  weight  dn  is  given  by 


A*h=2-IN-K)(J£)  (1+«W).  \e(N)\<  const- N~01  (2) 


This  bound  was  improved  by  several  authors  and  lead  to  the 
common  opinion  that  the  weights  of  long  primitive  binary 
BCH-codes  are  binomially  distributed  for  all  code  rates  R  <  1 
with  N  — *  oo.  In  this  paper  it  is  shown  that  this  is  only  true 
if  R-+  1. 


II.  Binary  Block  Codes  with  Binomial  Distance 
Distributions 

Using  the  distance  distribution  method  in  [2]  it  was  shown  that 
fixed  rate  code  sequences  of  binary  block  codes  with  asymp¬ 
totic  (in  N)  binomial  Hamming  distance  distribution  have  a 
cutoff  rate  that  is  equal  to  the  channel  cutoff  rate  of  the  BSC 
and  thus  are  asymptocially  optimal  according  to  Massey’s  cut¬ 
off  rate  criterion.  Furthermore,  in  [3]  it  was  shown  that  the 
error  exponent  of  such  a  code  family  attains  the  BSC  error 
exponent,  if  the  code  rate  R  lies  in  the  interval  between  the 
critical  rate  Rcrit  and  channel  capacity  Thus,  binary 

codes  with  binomial  distance  distribution  for  N  — ►  oo  have 
a  positive  error  exponent  for  all  rates  up  to  the  channel  ca¬ 
pacity  of  the  BSC.  This  argument  and  (2)  lead  to  the  con¬ 
clusion  that  primitive  binary  BCH-codes  are  asymptotically 
optimal  on  the  BSC,  i.  e.,  have  a  positive  error  exponent  E(R) 
in  the  interval  (0,  Re),  if  they  are  decoded  by  a  Maximum- 
Likelihood  decoder.  This  conclusion  is  shown  to  be  false  by  a 
contradiction  based  on  results  from  [3]  and  [4]. 


p.  b.  BCH  =>  E(R)  >  0  ,  R  <  Rc  <  1  ,  (5) 


and  from  (4)  and  Berlekamp’s  result  in  [4]  follows 

p.  b.  BCH  =>  E(R)  =  0  .  0  <  R  <  1  ,  (6) 


There  is  no  contradiction,  if  we  compare  expression  (6)  with 


(3) 

p.  b.  BCH  =>  E(R)  =  0  ,  R  —  1  .  (7) 

In  fact,  the  solution  of  the  contradiction  between  (5)  and 
(6)  can  be  obtained  by  analyzing  the  code  rate  R  of  the  code 
sequences  used  in  [1].  Using  equation  (1),  m  =  log2(iV  -j-  1), 
and  the  well  known  inequality  N  —  K  <  mt  for  BCH-codes 
we  have 

0.1  -  log2(lV  +  1) 

y/N 


R>  1  - 


(3) 


For  long  codes,  i.e.  N  — ►  oo,  this  result  leads  to  R  — ►  1. 
Thus,  only  a  comparison  of  expression  (6)  with  (7)  is  possible. 
But  for  R  —  1  all  binary  codes  have  binomially  distributed 
weights.  We  conclude  that  the  results  obtained  in  [1]  are  only 
valid  for  code  rate  R  — ►  1  when  N  — *  oo.  Furthermore,  from 
expression  (6)  follows  that  the  weights  of  these  codes  cannot 
be  binomially  distributed  for  R  <  1  and  N  — ►  oo. 
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Abstract  —  In  this  talk  we  shall  discuss  the  algebraic 
decoding  of  doubly  extended  Reed-Solomon  codes. 

I.  Introduction 

Recently,  it  has  been  shown  that  some  doubly  extended  Reed- 
Solomon  (DRS)  codes  have  a  simple  encoder  [l].  One  way  to 
decode  a  DRS  code  is  to  use  a  standard  decoder  twice,  see  [2, 
sec.  9.3].  An  extension  of  the  Berlekamp- Massey  algorithm 
has  been  published,  which  can  decode  DRS  codes  using  only 
one  trial,  [3].  The  aim  of  the  talk  is  to  demonstrate  that  any 
t-error  correcting  RS  decoder  -  such  as  the  PGZ-,  the  BM- 
or  the  Euclidean  algorithm  -  easily  can  be  extended  to  be  a 
decoder  for  a  DRS  code.  In  the  following  we  shall  show  this 
for  the  decoder  based  on  the  Euclidean  algorithm. 


II.  Decoding 

Let  c  =  (c_,  co,  ...  ,  cn- 1,  c_j-),n  <  q  —  1,  be  a  codeword  in 
a  (nykyd  =  n  —  k  4  1)  DRS  defined  over  GF(q).  The  two 
extended  symbols  are  denoted  c_  and  c+.  A  parity  check 
matrix  for  the  code  is 


'll  1 

0  1  a 


si 


0  1 


v  d- 2 


d— 2  \n— 1 


(<*  ) 


where  a  is  a  primitive  element  in  GF(q).  Let  r  = 
(r_,  ro ,  •  •  • ,  rn-\ ,  r+)  be  the  received  vector  and  e  =  r  —  c 
the  error  vector.  The  d  —  1  syndromes  So,  Si , . . . ,  Sd-2  are 
calculated  as  HrT  =  (S0,  Si , ... ,  Sd-2)T •  Let  S(X)  =  S0  4 
Six  4  •  ■  •  4  Sd-2Xd~2  be  the  syndrome  polynomial.  Assume 
that  w(e)  =  s  <  |_(d  —  l)/2j.  An  error-locator  polynomial 
A(x)  =  Ao  +  Aix  4  •  *  •  4  As a:5  is  defined  as 


A(a  *)  =  0  if  ei  ^  0,  0  <  i  <  n  —  1 

Ao  =  0  if  e+^0  (1) 

A s  =  0  if  e_  7^  0 


An  error-evaluator  polynomial  w(x)  is  defined  as 

w(x)  =  e-  •  A(a)  +  ^2  ei  T~TZ’  (2) 

z J  1  —  CTX 

i  ei 

where  I  =  {i\a  ^  0,  0  <  i  <  n  —  1}.  It  can  be  verified  that 
S(x),  A(x),  and  w(x)  satisfy  a  key-equation,  that  is 

A (x)S(x)  =  w(x)  mod  xd~l  (3) 


and  that 

deg  A(x)  <  ty  deg  w(x)  <  t  —  1,  deg  w[x)  <  deg  A(a;),(4) 
deg  X(x)  4  deg  w(x)  <  d  —  1. 


A  set  of  polynomials  (A(z),  iy(z))  satisfying  (3)  and  (4)  can 
therefore  in  the  usual  way  be  determined  by  the  Euclidian 
algorithm.  Based  on  the  polynomials  A(z)  and  w(x)  the  non¬ 
zero  elements  of  the  error  vector  (eo, . . . ,  en-i)  can  be  esti¬ 
mated,  since 

wict  )  _ 

e.  = - ;  if  A(a  z)  =  0  and  0  <  i  <  n  —  1  (5) 

a-*A'(a”*) 

which  is  the  usual  formula  for  calculating  the  error  symbols. 

The  errors  e_  and  e+  can  now  be  determined  by  using  the 
syndrome  equations,  that  is 


n  —  1 

e_  =  So  -  ^  6i  (6) 

i  =  0 

n  —  1 

e+  =  Sd-2  —  ej(ad~2y 

i=0 

Hence,  once  the  syndromes  have  been  calculated  a  standard 
RS  decoder  based  on  the  Euclidian  algorithm  needs  only  to 
be  extended  by  (6)  in  order  to  be  a  decoder  for  a  DRS  code. 

Example .  Consider  a  (17,9,9)  DRS  defined  over 

GF(16).  Let  a  be  a  primitive  element  satisfying  a4  = 
1  4  a.  And  let  the  syndromes  be  (5o,...,5r)  = 

(a11,  a14,  a12,  a12,  a5,  cv3,  a3,  a5).  Applying  Euclid’s  algo¬ 
rithm  to  S(x)  and  x8,  A(x)  and  w(x)  are  estimated  to  A(x)  = 
a10*3  4  a13x2  4  ax  and  w(x)  =  anx 3  4  a7x2  4  a12x.  The 
non-zero  roots  of  A(x)  are  a-2  and  a-7  which  from  (5)  implies 
that  &2  =  a5  and  ei  =  a9.  Using  (6)  e_  =  a  and  e+  =  a3. 

In  this  manuscript  we  have  only  considered  random  error  cor¬ 
rection.  The  conclusion  given  here  can  be  extended  to  include 
erasure  decoding  as  well. 

References 

[1]  J.  M.  Jensen,  “A  Class  of  Constacyclic  Codes”,  IEEE  Trans. 
Inform.  Theory ,  voi.  40,  pp.  951-954,  May  1994. 

[2]  R.  E.  Blahut,  “Theory  and  Practice  of  Error  Control  Codes”, 
Addison- Wesley  1983. 

[3]  A.  Diir,  “The  decoding  of  extended  Reed-Solomon  codes” ,  Dis¬ 
crete  Mathematics  90(1991),  pp.  21-40. 


Also  A(x)  is  a  polynomial  of  lowest  degree,  satisfying  (3)  and 

(4). 
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Abstract  —  We  determine  the  generalized  Hamming 
weights  dr  for  l<r</i  +  2ofa  binary  primitive  BCH 
code  with  minimum  distance  d  —  2h  —  1.  This  extends 
a  result  of  van  der  Geer  and  van  der  Vlugt  [2],  [3] 
who  determined  dr  for  1  <  r  <  5  for  the  triple  error- 
correcting  primitive  BCH  code.  We  also  consider 
the  weight  hierarchy  of  some  codes  with  parity-check 
polynomial  which  are  the  product  of  two  primitive 
polynomials  of  the  same  degree.  In  particular  we  have 
studied  some  of  the  codes  with  few  nonzero  weights 
studied  by  Niho. 

I.  Introduction 

Let  C  be  an  [n,/c,d]  binary  linear  code.  The  support  x(D) 
of  a  subcode  D  of  C  is  defined  as  the  number  of  coordinates 
which  are  not  identically  zero.  The  r-th  generalized  Hamming 
weight  of  a  code  C  is  defined  as 

dr  —  min{|  x(D)  ||  D  is  an  r-dimensional  subcode  of  C}. 

The  weight  hierarchy  of  the  code  C  is  dr,  r  =  1, 2  •  •  * ,  k.  The 
weight  hierarchy  is  an  important  parameter  for  the  code  in 
particular  for  estimating  the  trellis  complexity  of  the  code. 

To  find  the  weight  hierarchy  of  a  code  is  in  general  a  very 
hard  problem.  For  BCH  codes  some  partial  results  are  known. 
For  the  double  error-correcting  BCH  code  of  length  n  =  2m  — 
1,  it  is  known  that  d\  =  5,  c?2  =  8  and  dz  =  10.  For  the 
triple  error-correcting  BCH  code  van  der  Geer  and  van  der 
Vlugt  [2],  [3]  proved  that  di  =  7,  d<2  =  11,  (fe  =  13,^4  =  14 
and  ^4  =  15.  Our  first  result  is  a  generalization  of  this  result 
and  has  a  simple  and  direct  proof. 

Theorem  1  Let  C  be  a  primitive  BCH  code  of  length 
n  =  2m  —  1  and  designed  distance  d  =  2h  —  1.  Then 

dr  =  2h+1  -  2h+1~r  -  1  for  r  =  1,  2,  •  •  • ,  h  +  1 

and 

dh+ 2  =  2h+1  -  1. 

Proof.  The  positions  of  a  primitive  BCH  code  can  be 
indexed  by  the  nonzero  elements  of  GF{ 2m).  For  a  primitive 
BCH  code  of  designed  distance  d  =  2h  —  1,  it  is  well  known 
that  codewords  with  ones  in  the  locations  which  correspond 
to  the  nonzero  vectors  of  a  h-dimensional  subspace  of  GF( 2m) 
(considered  as  an  m-dimensional  vectorspace)  have  minimum 
weight. 

It  is  well  known  that  d\  =  2h  —  1  and  dr  >  = 

2h+1  -  2h+1~r  -lforl<r<h  +  l.  Hence  it  is  sufficient  to 
find  an  r-dimensional  subcode  with  the  support  given  in  the 
theorem  when  1  <  r  <  h  +  2. 

Let  U  be  a  subspace  of  dimension  h  +  1.  We  let  c i  for  i  = 
1,  2,  •  •  • ,  h+2  denote  minimum  weight  codewords  with  nonzero 
locations  corresponding  to  subspaces  Vi,  i  =  l,2,-*-,/i  +  2 


of  U .  We  will  show  how  these  can  be  chosen  such  that  the 
subcode  Dr  generated  by  ci, C2, ■ ■ • , cr  has  the  support  dr .  To 
select  the  subspaces  Vi  for  i  =  1, 2,  •  •  ■ ,  h+2  we  first  select  Vi  to 
be  any  h-dimensional  subspace  of  U.  Suppose  Vi,  V2,  •  •  • ,  Vi_i 
have  been  selected,  then  select  Vi  to  be  any  h-dimensional 
subspace  of  U  not  contained  in  Vi  U  V2  U  •  •  •  U  V-i.  This  is 
always  possible  as  long  as  i  <  h  +  2.  Then  it  is  easy  to  verify 
that  |  x(Dr)  |  =  |  Vi  U  V2  U  •  •  •  U  Vr  |=  dr,  which  completes  the 
proof. 

II .  On  the  weight  hierarchy  of  a  Niho  code 

We  have  studied  the  weight  hierarchy  of  some  codes  with 
parity-check  polynomial  which  are  the  product  of  two  binary 
primitive  polynomials  of  the  same  degree  m.  This  is  also  in 
general  a  hard  problem  since  it  includes  the  dual  of  the  double 
error-correcting  codes  BCH  codes  as  a  special  case,  where  only 
partial  results  are  known.  Let  rrii(x)  denote  the  minimum 
polynomial  of  a  a®,  where  a  denotes  an  element  of  order  2n  —  1. 

As  an  example  of  our  results  on  the  weight  hierarchy  of 
these  codes,  we  present  good  upper  bounds  on  the  complete 
weight  hierarchy  of  a  4- weight  code  of  length  2n  —  1  where  n  — 
2m  =  0  (mod  4)  (whose  weight  distribution  was  determined 
by  Niho  [1]). 

Theorem  Let  h(x)  =  m\(x)md{x)  be  the  parity-check 
polynomial  of  a  [22m  —  1, 4m,  22m_1  —  2m]  code  C  where  m  >2 
is  an  even  integer.  If  d  =  2m+1  —  1  then  gcd(d,  22m  —  1)  =  1 
and 

(2r  -  1) (22m— r  -  2m+1-r),  1  <  r  <  m 

(2m  -  l)(2m  -  2)  +  (2r-T7T  -  l)22m_r\  m  +  1  <  r  <  2m 

(2m  -  l)(2m  -  1)  +  (2r'— 2m  -  l)23m-r ,  2m  +  1  <  r  <  3m 

(2m  -  l)2m  +  (21  3m  -  l)24m-r ,  3m  +  1  <  r  <  4m. 

We  also  give  lower  bounds  and  show  that  equality  holds  in 
many  cases. 
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Abstract  —  We  prove  that  any  Z\  —  cyclic  code  has 
generators  of  the  form  ( fh ,  2  fg)  where  fgh  =  xn  —  l  over 
Zi .  From  this  we  can  easily  find  the  order  of  the  code 
and  generators  of  the  dual.  A  particular  interesting 
family  of  Z 4  —  cyclic  codes  are  quadratic  residue  codes. 
We  define  such  codes  in  terms  of  their  idempotent 
generators  and  show  that  these  codes  also  have  many 
good  properties  which  are  analogous  in  many  respects 
to  properties  of  quadratic  residue  codes  over  a  field. 

I  Z4  —  cyclic  CODES 

Let  Z\  denote  the  integer  residues  modulo  4.  Z±  is  a  ring 
which  has  2  as  a  zero  divizor.  A  set  of  n  —  tuples  over  Z\  is 
called  a  code  over  Z±  or  a  Z\  code  if  it  is  a  Z4  module. 

Let  fi :  Z±[x]  — *■  Z2[x\  be  the  map  which  sends  0,  2  to  0;  1,  3 
to  1  and  x  to  x. 

Definition:  A  polynomial  /  in  Z±[x]  is  basic  irreducible  if 
\if  is  irreducible  in  Z2[x];  /  is  primary  if  (/)  is  a  primary 
ideal. 

Lemma:  If  xn  —  1  =  f\f2  •  *  *  /r,  where  the  fi  are  basic  irre¬ 
ducible  and  pairwise  coprime,  then  this  factorization  is  unique. 

Theorem  1:  Let  all  ft  be  as  above  for  an  odd  n,  and  let  fi 
denote  the  product  of  all  f3  except  fi,  then  the  ideals  (/,-)  and 
(2 fi),  for  i  =  1,  2,  •  •  •  r,  generate  all  ideals  of  Z^[x\j{xn  —  1). 

If  /  is  a  polynomial,  f*  denotes  its  reciprocal. 

Theorem  2:  Suppose  C  is  a  Z±  —  cyclic  code  of  odd  length 
n,  and  xn  —  1  =  f\f2  •  •  ■  fr,  where  the  fi  are  basic  irreducible 
and  pairwise  coprime,  then  C  =  (fh,2fg),  where  g  and  h  are 
coprime  and  fgh=xn- 1,  \C\=4n~de9j~degh2n~de9j~deg9 ,  and 
Cx=(gmh\  2 g*f*). 

Theorem  3:  Let  C  be  as  in  theorem  2,  if  C=(/),  then  C 
has  an  idempotent  generator  in  Z±;  if  C—(2f),  then  C=( 2e), 
where  e  is  an  idempotent  generator  in  Z2\  if  C=(fh ,  2 fg), 
then  C=(e,  2v )  where  fgh  =  xn  —  1,  e  is  an  idempotent  in  Z4 
and  v  is  an  idempotent  in  Z2. 

Theorem  4:  If  C  =  (e(x))  where  e  is  an  idempotent  in  Z±[x], 
then  CL  —  (l-e(a:“1)). 

II.  Quadratic  Residue  Codes 

Quadratic  residue  codes  are  cyclic  codes  which  can  be  defined 
in  terms  of  their  idempotent  generators  [5]. 

Let  ei  =  YheQ  x *  anc*  62  ~  where  Q  is  the  set  of 

quadratic  residues  and  N  is  the  set  of  non  quadratic  residues 
for  a  prime  p  =  ±1  (mod  8).  Then  e\  and  e2  are  idempotents 
of  binary  Q.R  [p,p  -f  1/2]  codes. 

^his  work  was  supported  in  part  by  NS  A  grant  MDA  904-91- 
H-0003. 


Theorem  5:  Let  p  be  a  prime  =  ±1  (mod  8)  such  that  p  -f  1 
(or  p  —  1)  =8r.  If  r  is  odd  then  et  +  2tj  and  1  +  3e*  -j-  2e3  are 
idempotents  over  Z\,  where  i,j  =  1,2  and  i  ^  j. 

If  r  is  even  then  3et  and  1  +  ei  are  idempotents  over  Z\ , 
where  i  —  1,2. 

Definition:  A  Z\  —  cyclic  code  is  a  Z±—  quadratic  residue 
(Q.R)  code  if  it  is  generated  by  one  of  the  idempotents  in  the¬ 
orem  5. 

Theorem  6:  Let  p  be  a  prime  and  p  +  1  =  8r  for  odd  r,  if 
Qi  =  (ei  +  2e2),  Q2  =  (e2  +  2ei)  ,  Q[  =  (1  +  3e2  +  2ei)  and 
Q2  =  (1  +  3ei  H-  2e2)  are  Z±  —  Q.R  codes,  then 

(a)  Q 1  and  Q2  are  equivalent,  Q[  and  Q'2  are  equivalent; 

(b)  Qx  fl  Q2  =  (3 h)  and  Qx  +  Q2  =  Rp  =  Z,[x]/(xp  -  1), 
where  h  is  all  1  vector; 

(c)  |Qi|  =  4P+1/2  =  |Q2|; 

(d)  Q,  =  Q[  +  ( h ),  Q2  =  Q'2  +  ( h ); 

(e)  \Qi  |  =  IQ2 1  =  4P_1^2; 

(f)  Q\  and  Q2  are  self-orthogonal  and  Qx  =  Q[,  Q2  —  Q2. 
Note:  If  r  is  even  or  p  =  1  (mod  8),  there  are  similar  results. 

Theorem  7:  Let  Q  be  an  extended  Z±  —  Q.R  code.  Then 
the  group  of  Q  contains  a  subgroup  which  is  isomorphic  to 
PSL2{p). 

Theorem  8:  The  extended  Z4  —  Q.R  code  of  length  32  has 
minimum  Lee  weight  14,  minimum  Euclidean  weight  16  and 
minimum  Hamming  weight  8. 

The  extended  Z\  —  Q.R  code  of  length  48  has  minimum 
Lee  weight  18  ,  minimum  Euclidean  weight  24  and  minimum 
Hamming  weight  12. 

Their  images  under  the  Gray  map  are  non-linear  and  have 
better  minimum  Hamming  weight  than  any  known  binary  lin¬ 
ear  codes  [3], 
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Abstract  —  A  recently  derived  upper  bound  for 
Weil-type  exponential  sums  over  Galois  rings  leads  di¬ 
rectly  to  an  estimate  for  the  minimum  Lee  distance  of 
Z4-linear  trace  codes.  In  this  paper,  an  improved  min¬ 
imum  distance  estimate  is  presented.  The  improved 
estimate  is  tight  for  the  Kerdock  code  as  well  as  for 
the  Delsarte-Goethals’  codes. 

I.  INTRODUCTION 

Let  Rm  :=  GR{ 4,  m)  denote  the  Galois  ring  (char.  4)  of  4m 
elements.  Let  (3  be  an  element  in  Rm  of  order  2m  —  1  and  set 
Tm  =  {0,  l,/3,/32,...,/32m“2}.  Let  f(x)  G  Rm[x]  be  non¬ 

degenerate  with  weighted  degree  Df  [1].  We  define  a  Z4-linear 
weighted  degree  trace  code  C(m,  D )  via 

C(m,D)  =  {9+Tr(f(x))  \  Df  <  D,9  e  Z4}xeTm- 

The  minimum  Lee  distance  dm jn  of  the  codes  C(m,  D)  can  be 
shown  to  be 

^min  =  ™  {2"  - "MTrI'W1>  I  »  «  Z*' 

f(x)  nondegenerate,  Df  <  D} 


For  a  more  general  version  of  Theorem  2,  see  [3], 

We  denote 

Pf,m=X(J2  “Tr(/W))- 

xeT-m 

Let  f(x)  =  a(x)  +  2 b(x),  a(x),b(x)  G  Tm[x].  For  any  positive 
integer  j,  let  w2(j)  denote  the  Hamming  weight  of  the  binary 
expansion  of  j.  For  g(x)  =  Ey-n  gjx3  G  Tm[x],  we  define 
w2((7(a;))  =  max  {w2(j)  |  g3  #  0,  0  <  j  <  n}.  We  then  define 
w2(f(x))  =  max  {2  •  w2(a(a;)),  w2(b(x))}.  It  can  be  shown 
that  l  in  Theorem  2  satisfies  l  >  [u,2(7(Ay1  -  Thus, 

2^ w 2 1 1 2  •  p/im.  (1) 

Let  pfims  =  ^(Ex€Tms  uTrU{x))).  In  a  similar  manner, 
2  w2 (/(*))  1 12  •  pf,ms  so  that 

2L»2(?(x),Js|2  .Pf>ms.  (2) 

Using  a  result  of  Ax  and  Moreno  and  Moreno’s  adaptation  [2] 
of  Serre’s  technique,  we  have 

Theorem  3  Let  h,  =  L^j^yyJ,  ef  =  Then 


where  denotes  the  real  part  of  x. 

In  [l],  Kumar  et  al.  prove 
Theorem  1 


E 


wTr(/(*)) 


*6  Tm 


<  (Df  -  1)2^. 


Thus,  |»(Lxerra  ^Tr(/W))  I  <  (Df -1)2*.  In  this  paper, 
we  show  that  this  estimate  can  in  some  cases,  be  strengthened 
upto  a  factor  of  \/2. 


II.  IMPROVED  ESTIMATES 


Ip/, m|  < 


2h! -1  (£>/-!)  [2^0  y'gj 


2e/‘ 


J- 


Further, 


Corollary  4  Let  dmjn  be  the  minimum  Lee  distance  of  the 
code  C(m,D).  Let  e  =  min{ef\Df  <  D}  and  h  = 
min{hf\Df  <  D}.  Then 


dmin  —  ^ 


2e”1L 


2h"l(D-l)[21-h^g\ 


J- 


The  bound  in  Theorem  3  and  Corollary  4  are  infact  sharp 
when  applied  to  Kerdock  and  Delsarte-Goethals’  codes. 


Define  D\  =  D  or  D  -  1  when  D  is  odd  or  even  respectively. 
With  the  code  C(m,D),  we  associate  the  sets  Si  =  {/32  a  | 
1  <  a  <  [f  J,0  <  i  <  m  -  1},  and  S2  -  {(32'b  |  1  <b  < 
£>i,0  <  i  <  m  —  1}.  Also  let  S\  —  {(3\  •  f$2  |  /3i,/32  €  Si}- 
Define  =  S2  U52.  Using  McEliece’s  theorem  on  divisibility 
of  binary  cyclic  codes,  one  can  show  that 

Theorem  2  Let  l  be  the  smallest  integer  for  which  the  prod¬ 
uct  of  terms  in  Sd  yields  1.  Then  2l  1  divides  the  Lee  weight 
of  every  codeword  in  C(m,  D ). 

iThe  work  was  supported  in  part  by  the  Norwegian  Research 
Council  for  Science  and  the  Humanities  and  in  part  by  the  National 
Science  Foundation  under  Grant  Number  NCR-93-05017. 
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Abstract  —  We  point  out  that  a  large  variety  of 
nonlinear  OQPSK-Type  waveforms  can  be  exactly 
represented  as  a  sum  of  linear  OQPSK-type  com¬ 
ponents.  A  similar  representation,  with  an  in¬ 
creased  number  of  components,  can  be  adopted 
for  any  signal  obtained  by  filtering  and  nonlinear 
amplification  of  the  above  mentioned  waveforms. 

I.  INTRODUCTION 

OQPSK-type  modulation  schemes  are  known  to  be  well 
suited  for  radio  applications  where  nonlinear  power  ampli¬ 
fiers  operating  close  to  saturation  are  employed.  In  most 
cases,  a  linear  modulation  scheme  is  assumed  and,  hence, 
the  complex  envelope  of  the  modulated  waveform,  at  the 
amplifier  input,  can  be  given  by  sin(t)  =  Y^n  cnx(t  —  nT ), 
where  x{t)  denotes  the  modulation  pulse,  T  is  the  du¬ 
ration  of  the  bit  interval  and,  according  to  the  data  se¬ 
quence,  c2i  =  ±1  and  c2;+i  =  ±j.  Whenever  the  envelope 
i  $in(t)  |  has  fluctuations,  the  nonlinear  characteristics  of 
the  amplifier  lead  to  some  distortion.  On  the  other  hand, 
if  a  nonlinear,  constant  -  envelope  modulation  scheme  is 
adopted,  no  signal  distortion  appears  at  the  amplifier  out¬ 
put.  It  is  well-known  that  a  Binary-CPM  scheme  with 
h  —  1/2  can  also  be  regarded  as  a  member  of  a  wide 
OQPSK-type  class,  even  for  the  ’’partial  -  response”  case 
(e.  g.,  GMSK,  TFM,  etc.),  since  the  corresponding  mod¬ 
ulated  waveforms  are  similar  to  those  resulting  for  the 
linear  OQPSK-type  schemes.  In  [1]  Laurent  has  shown 
that  any  Binary-CPM  signal  can  be  represented  as  a  sum 
of  several  linearly  modulated  signals,  each  of  them  char¬ 
acterized  by  a  specific  modulation  pulse,  x(k\t).  The  re¬ 
quired  number  of  linear  components  depends  on  the  dura¬ 
tion  of  the  ’’frequency  pulse”,  g(t ),  in  the  standard  CPM 
representation.  If  this  duration  is  LT  (for  integer  L),  the 
complex  envelope  can  be  written  as 
2L“ 1  —  1 

*(<)=  (n 

k=z  0  n 

with  pulses  x^k\t)  and  sequences  { c depending  on 
g(t ),  h  and  the  data  sequence.  For  h  =  1/2,  all  the  linear 
components  of  the  CPM  signal  belong  to  the  OQPSK- 
type  class:  this  means  that,  if  cffi  =  ±1,  then  c^+1  =  ±j. 

II.  GENERALIZED  REPRESENTATION  OF 
NONLINEAR  OQPSK-TYPE  WAVEFORMS 

Similarly  to  the  Binary-CPM  signals  with  h  =  1/2,  many 
other  OQPSK-type  signals  can  be  exactly  represented  as 
a  sum  of  linear  OQPSK-type  components.  This  is  the  case 


with  the  signal  at  the  output  of  a  nonlinear  power  am¬ 
plifier,  for  an  input  belonging  to  the  OQPSK-type  class, 
characterized  by  the  above-mentioned  complex  envelope 
Sin(t).  Eqn  (1)  can  still  be  valid,  provided  that  x(t)  has 
duration  (L+l)  T  and  the  power  amplifier  is  modelled 
as  a  bandpass  memoryless  nonlinearity;  moreover,  all  the 
linear  components  belong  to  the  OQPSK-type  class.  In 
this  case,  =  cn  and,  for  L  >  2,  we  have  to  define 
sequences  {cL^},  k  —  1, 2L~l—  1,  according  to 

4‘)=4o>n^,)  (2) 

where  /?£*/  =  1  if  ak,i  =  0  and  f3(nkJ  =  if 

akj  =  1,  when  (ak,L-i(*k,L-2  •  •  •  <*m)  taken  as  the 
binary  representation  of  k .  The  calculation  of  the  2L~~1 
pulses  x(k\t)  can  easily  be  done  by  taking  advantage  of 
the  correlation  properties  of  the  sequences  {c^};  for  an 
i.i.d.  input  sequence,  E[cn^ciP*]  =  1  if  n  —  m  and  i  —  j, 
and  zero  otherwise.  Hence 

=  E[cik)*s(t)],  k  =  0, 1, ,  2L~k  -  1,  (3) 

where  s(t)  can  be  obtained  from  Sin(t)  by  taking  into  ac¬ 
count  the  AM/AM  and  AM/PM  conversion  functions  of 
the  amplifier.  If  x(t)  occupies  the  interval  [0 ,  (L  +  1)T], 
the  resulting  ’’output  pulses”  will  occupy  the  fol¬ 

lowing  intervals:  [0,  (L  -1-  1)T],  for  k  =  0;  [0,(L  -  1  )T], 
for  k  =  1;  [0,  ( L  —  2 )T],  for  both  k  =  2  and  k  =  3;  . . 
[0,T],  for  2l~ 2  <  k  <  2l~1  —  1.  We  stress  the  connec¬ 
tions  between  the  pulses  and  the  low  pass  equiv¬ 

alent  Volterra  kernels  which  characterize  the  nonlinear 
transmission  system  [2].  Additionally,  we  point  out  the 
following:  if  any  OQPSK-type  signal  given  by  (1) 
sequences  and  pulse  durations  defined  .as  previously)  is 
filtered  and  then  power  amplified  by  a  bandpass  memo¬ 
ryless  nonlinearity,  the  resulting  output  signal  can  also  be 
represented  as  a  sum  of  linear  OQPSK-type  components; 
the  number  of  output  components  is  2L+H~l  when  the 
filter  impulse  response  has  duration  HT  (integer  H). 
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I.  Introduction 

Guided  Scrambling  (GS)  line  codes  augment  the  source  bit 
stream  prior  to  self-synchronizing  scrambling  to  ensure  that  the 
scrambling  process  generates  an  encoded  bit  sequence  with  good 
line  code  characteristics  [1].  With  arithmetic  from  the  ring  of 
polynomials  over  GF(2),  self-synchronizing  scrambling  can  be 
interpreted  as  division  of  the  source  bit  sequence  by  the 
scrambling  polynomial  and  transmission  of  the  resulting  quotient. 
When  augmenting  bits  are  inserted  in  fixed,  periodic  positions, 
GS  codes  can  be  interpreted  as  block  line  codes  which  encode 
source  words  to  quotients.  In  particular.  Block  Guided 
Scrambling  (BGS)  generates  a  transmitted  bit  stream  which  is  a 
concatenation  of  finite-length  quotients  chosen  from  sets  of 
quotients  which  represent  each  source  word.  Alternatively,  in 
Continuous  Guided  Scrambling  (CGS),  the  transmitted  sequence 
appears  to  be  a  continuous  quotient  due  to  the  fact  that  the 
encoder  shift  registers  are  updated  following  quotient  selection  to 
contain  the  remainder  associated  with  the  selected  quotient.  The 
quotient  selection  mechanisms  of  both  BGS  and  CGS  encoders 
can  be  modeled  as  finite  state  machines  with  quotient  sets  as 
input  and  the  selected  quotient  as  output.  In  CGS  encoding,  the 
selection  mechanism  also  outputs  the  remainder  associated  with 
the  selected  quotient. 

In  this  paper  we  describe  several  characteristics  of  GS 
encoders  and  their  coded  sequences.  We  begin  by  defining 
required  terms. 

II.  Definitions 

Let  the  complement  of  a  polynomial  p(x)  be  the  polynomial 
that  contains  the  coefficient  one  in  every  position  that  p(x) 
contains  the  coefficient  zero,  and  contains  a  zero  in  every  position 
that  p(x)  contains  a  one. 

If  a  quotient  set  &  exists  such  that  each  quotient  contained 
in  this  set  has  a  complement  in  set  Q},  and  each  quotient  in  Q}  has 
a  complement  in  Qh  we  say  that  quotient  sets  &  and  Q}  are 
complementary.  Note  that  a  quotient  set  can  be  its  own 
complement. 

We  also  say  that  states  in  the  Mealy  machine  model  of  the 
selection  mechanism  are  complementary  if  complementary 
quotients  are  selected  from  these  states  whenever  the  input 
quotient  selection  sets  are  complementary. 

Finally,  we  denote  a  remainder  that  is  generated  from  a 
particular  state  with  non-zero  probability  after  an  undetermined 
period  of  encoding  to  be  a  recurrent  remainder  for  that  state. 

HI.  Properties  of  GS  Encoders 

We  now  state  three  propeties  of  GS  encoders.  Complete 
derivation  of  these  properties  can  be  found  in  [2]. 

Property  Eli  Every  quotient  selection  set  has  a  complement. 

Property  E2:  In  all  selection  mechanisms  proposed  in  [1  -  3] 
which  enforce  symmetrical  bounds  on  the  running  digital  sum 
(RDS)  of  the  encoded  bit  sequence  or  do  not  restrict  RDS,  every 
state  has  a  complement.  In  general,  this  holds  for  all  GS  encoders 
where  there  is  symmetry  in  quotient  selection.  If  the  selection 


mechanism  can  be  modeled  with  a  single  state,  it  is  its  own 
complement.  Finally,  when  complementary  quotients  are  selected 
from  complementary  states,  the  next  states  are  complementary. 

Property  E3:  When  all  source  words  occur  with  non-zero 
probability  regardless  of  the  encoding  interval,  complementary 
states  in  the  CGS  encoder  selection  mechanism  have  the  same 
number  of  recurrent  remainders. 

IV.  Properties  of  GS  Encoded  Sequences 

GS  coded  sequences  exhibit  the  following  properties 
whenever  the  source  bit  stream  is  stationary  and  is  comprised  of 
words  of  any  length  in  which  the  words  are  independent  and  all 
words  have  non-zero  probability  of  occurrence.  These  properties 
also  hold  in  many  instances  when  one  or  more  of  the  source  words 
do  not  occur. 

Property  SI:  In  CGS  sequences  encoded  using  even-weight 
scrambling  polynomials,  zeros  and  ones  occur  with  equal 
probability  in  all  bit  positions.  Consequently,  the  power  spectral 
density  of  the  coded  sequence  can  contain  discrete  components 
only  at  frequencies  /  =  m!T  for  integers  m.  The  discrete  power 
spectrum  has  the  form 

?)•  <’> 

m=— <*> 

where  P(f)  is  the  Fourier  transform  of  the  pulse  shape,  T  is  the 
duration  of  each  coded  symbol,  T]  is  the  average  code  symbol 
amplitude  given  by 


and  V0  and  V,  are  the  values  with  which  the  symbols  zero  and  one 
are  represented. 

Property  S2:  If  the  scrambling  polynomial  has  odd  weight,  the 
power  spectra  of  CGS  coded  sequences  resulting  from  source 
sequences  with  complementary  statistics  are  identical.  Further, 
when  source  words  are  equiprobable,  the  discrete  spectrum  is 
given  by  Equation  1 . 

Property  S3:  When  the  source  words  are  equiprobable,  the  power 
spectral  density  is  not  affected  by  the  block  or  continuous  nature 
of  the  code,  and  the  discrete  spectrum  is  given  by  Equation  1. 
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Abstract  —  This  paper  describes  methods  by  which 
the  residual  correlation  in  CELP-encoded  speech  can 
be  exploited  by  an  appropriately  designed  channel  de¬ 
coder.  Specfically,  the  LSP  redundancy  in  FS  1016 
CELP  is  quantified  and  used  to  effect  near-MAP 
decoding  of  Reed- Solomon  and  convolutional  codes. 
Coding  gains  of  up  to  3.6  dB  are  obtained  over  con¬ 
ventional  ML  algorithms. 

Summary  of  Results 

We  consider  the  problem  of  reliably  transmitting  speech 
compressed  with  codebook-excited  linear  predictive  (CELP) 
coding  over  a  noisy  channel.  The  particular  implementation 
we  consider  is  Federal  Standard  1016  4.8  kbit/s  CELP. 

Like  all  practical  speech  encoders,  CELP  does  not  elimi¬ 
nate  all  the  redundancy  in  speech  samples;  what  remains  is 
the  “residual  redundancy”.  In  this  work,  we  consider  meth¬ 
ods  by  which  channel  codes  can  use  this  redundancy  to  en¬ 
hance  the  performance  of  CELP-encoded  speech  over  very 
noisy  channels.  Specifically,  we  describe  ways  the  residual 
redundancy  in  CELP’s  line  spectral  parameters  (LSP’s)  can 
be  quantified  and  exploited.  We  begin  by  proposing  two  mod¬ 
els  for  LSP  generation;  the  first  model  incorporates  only  the 
non- uniformity  of  the  LSP’s  and  their  correlation  within  a 
CELP  frame,  while  the  second  provides  for  correlation  be¬ 
tween  frames  as  well.  When  these  models  are  “trained”  using 
an  actual  CELP  bitstream  they  show  that  as  many  as  12.5  of 
the  30  high-order  LSP  bits  in  each  frame  may  be  redundant. 

We  next  present  decoding  algorithms  that  exploit  that  re¬ 
dundancy  via  both  convolutional  and  Reed-Solomon  codes. 
For  convolutional  codes,  we  employ  three  soft-decision  decod¬ 
ing  schemes,  all  based  on  the  Viterbi  algorithm: 

•  ML  -  the  “usual”  maximum  likelihood  algorithm; 

•  MAP  1  -  a  MAP  algorithm  that  exploits  the  redun¬ 
dancy  due  to  the  non-uniformity  of  the  LSP’s  and  their 
correlation  within  a  frame  -  about  10  bits/frame; 

•  MAP  2  -  which  exploits  the  redundancy  from  the  non- 
uniform  distribution  of  the  LSP’s  and  their  correlation 
within  and  between  frames  -  about  12.5  bits/frame. 

For  block  codes,  we  present  four  soft-decision  decoding 
(SDD)  algorithms: 

•  SDD  1  -  which  approximates  “traditional”  maximum 
likelihood  decoding  and  does  not  attempt  to  exploit  any 
of  the  residual  redundancy; 

•  SDD  2  -  which  exploits  only  the  redundancy  due  to  the 
ordered  nature  of  the  LSP’s  -  about  4.4  bits/frame; 

1The  work  of  Alajaji  and  Fuja  has  been  supported  by  the 
U.S.  Department  of  Defense  under  grant  MDA  904-94-3008.  The 
work  of  Phamdo  has  been  supported  by  NTT  Corporation. 


•  SDD  3  ~  which  like  MAP  1  exploits  the  redundancy  due 
to  the  non-uniform  distribution  of  the  LSP’s  and  their 
correlation  within  a  frame; 

•  SDD  4  -  which  like  MAP  2  exploits  both  the  inter-  and 
intra-frame  correlation  and  the  redundancy  due  to  the 
non-uniform  distribution. 

Figures  1  and  2  display  the  simulated  performance  of  these 
decoders  in  terms  of  average  spectral  distortion;  the  channel 
is  AWGN  and  the  modulation  is  BPSK.  Clearly,  MAP  2  and 
SDD  4  provide  exceptional  performance  at  very  low  Eb/No. 


-2-101234 
Eb/N0  (dB) 

Fig.  1:  Average  spectral  distortion  -  convolutional  code. 


Eb/No  (dB) 

Fig.  2:  Average  spectral  distortion  -  Reed-Solomon  code.  (HDML 
=  hard-decision  ML.) 


286 


Modal  Analysis  of  Linear  Nonbinary  Block  Codes 
Used  on  Stochastic  Finite  State  Channels 

Hans-Jiirgen  Zepernick1 

Australian  Telecommunications  Research  Institute, 

Perth,  Western  Australia,  Australia 


Abstract  —  An  analytical  method  for  an  exact 
evaluation  of  the  coset  probabilities  of  algebrai¬ 
cally  decoded  linear  nonbinary  block  codes  used 
on  stochastic  finite  state  channels  is  presented. 
The  analysis  can  be  performed  in  a  transform  do¬ 
main  showing  an  easy  computational  structure. 
The  transform  coefficients  are  connected  with  the 
coset  probabilities  by  a  complex  generalization  of 
the  Walsh- Hadamard- Transformation. 

I.  Introduction 

A  performance  assessment  of  block  codes  can  be  achieved 
by  rating  the  decoding  events,  where  the  channel  has  to 
be  included.  In  this  paper  an  analytical  method  for  eva¬ 
luating  coset  probabilities  of  algebraically  decoded  linear 
block  codes  over  prime  fields  is  presented.  The  codes  are 
used  on  burst  error  channels.  The  ideas  explained  are 
part  of  a  general  modal  approach  to  coding  schemes  [1]. 

II.  Stochastic  Finite  State  Channel 

Input,  output  and  state  of  the  channel  are  described  by 
a  finite  alphabet  X  of  input  symbols  x,  a  finite  alphabet 
y  of  output  symbols  y  and  a  finite  set  S  of  S  states  s. 
The  conditional  probability  p(r/,  s'|s,  x)  is  the  probability, 
when  the  channel  is  in  state  s,  that  the  input  x  results 
in  an  output  y  and  the  next  state  will  be  s' .  They  are 
the  elements  dssi(y\x)=p(y,  $'|s,  x)  of  a  family  of  input- 
output-matrices  D(t/|x)  =  [dssi{y \x)].  The  sets  A,  T,  S 
and  the  matrix  family  {D(y|x)}  define  the  stochastic  se¬ 
quential  machine  V.  The  machine  T>  together  with  the 
initial  probability  distribution  er0  on  the  state  set  S  form 
the  stochastic  automaton  V.  Later,  the  symbols  x,  y  are 
considered  as  elements  of  a  prime  field  F  —  GF(p).  For 
the  assumed  symmetric  channels,  the  matrix  family  is  re¬ 
presented  by  the  finite  set  {  Dy  D(?/  =  x+/|x)  |  /EF  }. 

HI.  Modal  analysis  of  nonbinary  codes 

We  consider  linear  ( n,k )  block  codes  Vo  over  the  prime 
field  F  =  GF(p)  with  m  =  n  —  k  check  digits.  The 
decoder  is  modeled  as  a  deterministic  acceptor  proces¬ 
sing  the  output  symbols  of  the  channel.  The  syndrome 
s  =  C=i^)h?  is  regarded  as  the  decoder  state,  where 
each  h  =  [Am_i,  •  •  - ,  hi,  h0]T  is  a  column  of  the  parity 
check  matrix  H.  It  is  useful  to  evaluate  the  syndrome 
step-by-step  leading  to  partial  syndromes  and  the  recur¬ 
sion  sM  —  +  yW h£;  v  —  1,  ■  •  • ,  n,  where  s(°)  =  0. 

To  each  recursions  step  a  section  of  the  code  trellis  is 

1The  author  is  on  leave  from  the  FemUniversity  of  Hagen 


assigned.  The  trellis  can  be  analytically  described  by 
trellis  matrices  in  the  form  of  the  Kronecker  product 
M  h(y)  =  Mhm_l(y)®---®Mhl(y)®Mho(y)  of  elemen¬ 
tary  trellis  matrices  M/l(y)  =  circ{ 0  ■  •  •  10  •  •  •  0).  The  one 
element  in  the  first  row  of  the  circulant  matrix  M^(y)  is 
in  the  column  t  —  y  h  mod  p.  The  eigenvectors  of  Mh(y) 
are  the  columns  of  the  modal  matrix  Wm  =  Wi®Wm_i, 
where  Wi  =  [wij]  and  w  =  .  Using  Wm  for  simila¬ 

rity  transformation  of  the  trellis  matrices  into  the  trans¬ 
form  domain,  we  obtain  the  spectral  matrices  in  the  form 
of  the  Kronecker  product  Ah (t/)  =  W'1Mh(y)Wm  = 
A^m_1  (y)  ® •  •  *®  Ah1  (y)  ®  A/ip  (y)  of  elementary  spectral 
matrices  A h(y)  =  diag{ to*’*},  where  t  =  y  h  mod  p. 

IV.  Modal  Analysis  of  the  coded  system 
The  channel-decoder-cascade  can  be  represented  by  a 
weighted  trellis.  An  analytical  description  of  the  weigh¬ 
ted  trellis  can  be  obtained  by  a  weighted  trellis  matrix 

UH  =  n”=lUh„>  where  Uh  =  Ey£GF(p)Mh(2/)®IV 
Then,  the  row  vector  of  the  T  =  pm  coset  probabilities 
Pt  is  P  =  [PojPu  ’  •  -,Pr-i]  =  (ro®<ro)UH(I®e),  with 
To  =  [1, 0,  •  ■  • ,  0],  e  =  [1,  •  •  • ,  1]T  and  the  identity  matrix  I. 
The  mapping  into  the  transform  domain  can  be  achie¬ 
ved  by  using  T  =  Wm  ®  I  and  results  in  the  weighted 
spectral  matrix  =  n”=i  =  Il”=i 
where  0h  =  T_1UhT  and  ©ih-  SyeGF(p)  Dy™<l,yhT> ; 
<  a,  b  >=  mod  a  ~  vecp(a ),  b  =  vecp(b). 

Premultiplying  0h  by  (to  0  &o)  —  (to  0  <xo)T  and  post- 
multiplying  the  result  by  I  (0  e  yields  the  transform  coef¬ 
ficients  Qi  given  in  the  vector  Q  =  [Q 0,  Q i,  *  •  • ,  <2t-i]  = 
(t0  ®  <ro)0H(I  0  e).  The  coefficients  Qi  and  probabili¬ 
ties  Pt  are  connected  via  the  complex  Walsh-Hadamard- 
Transformation  Pt  —  A  QiW~<X}t> ,  where  T  =  pm . 

V.  Conclusion 

The  automata  model  of  the  channel- decoder- cascade  al¬ 
lows  an  analytical  evaluation  of  the  coset  probabilities  of 
nonbinary  block  coded  systems.  By  means  of  the  modal 
analysis  the  task  can  be  shifted  into  a  transform  domain 
of  easier  computational  structure  and  reduced  storage  re¬ 
quirements.  The  domains  are  connected  by  a  complex 
Walsh-Hadamard- Transformation.  The  results  are  exact 
within  the  framework  of  the  model.  The  proposed  fast 
algorithm  is  suitable  for  implementation  on  a  computer. 
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Abstract  —  Sequential  decoding  for  the  Gilbert- 
Elliott  channel  is  considered.  The  decoding  proce¬ 
dure  capacity  CL  is  defined  to  be  the  supremum  of  the 
rates  for  which  there  exists  a  code  that  gives  arbi¬ 
trarily  small  decoding  error  probability.  For  different 
assumptions  of  the  decoder’s  knowledge  of  the  chan¬ 
nel  states  expressions  for  Co  are  derived. 

I.  Intro cuction 

Assume  that  a  tree  code  is  used  together  with  sequential 
decoding  to  communicate  over  the  Gilbert- Elliott  channel.  Let 
P(£)  denote  the  average  probability  of  decoding  error  over  the 
ensemble  of  random,  infinite  depth  tree  codes.  In  this  paper 
we  address  the  question:  “When  will  P(£)  — ►  0?”. 

Consider  the  Gilbert- Elliott  channel  model  and  denote  the 
error  probabilities  in  the  Good  and  Bad  states  by  ea  and  eB, 
respectively.  Furthermore,  let  &  and  PB  denote  the  fraction 
of  time  spent  in  the  Good  and  Bad  states,  respectively. 

II.  Decoding  procedure  capacity 

Let  us  define  the  decoding  procedure  assumptions ,  D.  The 
optimistic  assumption,  D  —  o,  assumes  that  the  decoder  has  a 
complete  knowledge  of  the  channel  state,  which  could  be  given 
by  a  genie.  The  pessimistic  assumption,  D  —  p,  assumes  that 
the  decoder  neither  is  given  any  channel  state  information  nor 
tries  to  make  any  estimate  of  it.  Given  the  decoding  procedure 
assumption  D  and  the  use  of  the  Gilbert- Elliott  channel,  let 
Co  denote  the  supremum  of  the  rates  for  which  we  can  guar¬ 
antee  that  there  exists  a  code  that  gives  an  arbitrarily  small 
decoding  error  probability  P{£).  We  will  call  Co  the  decoding 
procedure  capacity . 

We  have  proved  that  the  decoding  procedure  capacities  are 
given  by 

C0  =  Ig  ’  Cbsc(cg)  T  lb  ■  Cbsc{zb) 

and 

Cp  =  Ig  ■  (CBsc(eo)  —  h(b))  -f  PB  •  ( Qssc(eB )  —  h(g)) 

=  C0-(PG-h(b)  +  PB-h(g )), 

where  6  and  g  denote  the  transition  probabilities  from  Good 
to  Bad  and  from  Bad  to  Good,  respectively,  in  the  channel 
model. 

Theorem  1  Given  the  Gilbert- Elliott  channel  and  the  decod¬ 
ing  procedure  assumptions,  the  use  of  a  rate  R  random,  infi¬ 
nite  depth  tree  code  with  the  stack  decoder,  then  for  any  rate 
R  <  Q>  and  ye  7L+ , 

P(N  >  Tj)  — ►  0  if  T)  — ►  oo, 

where  N  is  the  number  of  computations  in  an  incorrect  subtree. 
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When  we  wish  to  transmit  over  an  ordinary  Discrete  Memory¬ 
less  Channel  at  rates  (above  RComP  and)  close  to  its  capacity, 
it  is  sufficient  to  allow  the  number  of  computations  of  sequen¬ 
tial  decoding  to  go  to  infinity  to  be  able  to  guarantee  that 
P{£)  can  be  chosen  arbitrarily  small.  We  will  show  that  this 
is  also  sufficient  for  transmission  close  to  rates  Cb,  which  is 
the  motivation  why  we  call  these  rates  “decoding  procedure 
capacities” . 

Theorem  2  Given  the  assumptions  of  Theorem  1,  then  for 
any  rate  R  <  Ch  the  average  probability  of  decoding  error 

m  -  o, 

if  the  number  of  computations,  N ,  is  allowed  to  go  to  oo. 

Since  the  important  condition  in  Theorem  2  is  that  R  <  Gb,  it 
is  clear  that  the  theorem’s  statement,  given  the  decoding  pro¬ 
cedure  assumptions,  is  equivalent  to  stating  that  the  maximal 
transmission  rate  over  the  Gilbert-Elliott  channel  is  at  least 
the  rate  Ch¬ 
in  the  pessimistic  case  we  can  interpret  this  as  follows.  For 
arbitrarily  small  P(£),  there  exists  a  code  such  that  the  trans¬ 
mission  rate  will  be  (at  least)  Cp ,  even  without  any  knowledge 
of  the  channel  state  or  any  attempt  to  estimate  it. 

III.  Channel  capacity 

A  common  method  to  lowerbound  Cqe  is  to  calculate 
Casc(e),  where  e  =  Ig  •  Cg  +  is  *  eB,  but  it  turns  out  that  Cp 
is  a  better  lower  bound  for  channels  with  a  stable  behaviour. 
The  optimistic  case  helps  us  to  find  a  stronger  result: 

Theorem  3  Given  that  the  receiver  has  a  complete  channel 
state  knowledge,  then  the  channel  capacity  for  the  Gilbert- 
Elliott  channel  is  equal  to 

c£e  =  a,. 

From  the  proof  of  Theorem  3  follows  immediately 

Corollary  4  Given  that  both  transmitter  and  receiver  have 
complete  knowledge  of  the  channel  state  sequence  then  for  the 
channel  capacity  of  the  Gilbert-Elliott  channel  we  have 

/~tTR  _  W? 

M5E  —  MSB  ’ 

It  should  be  noted  that  the  capacities  and  in  con¬ 
tradiction  to  what  is  the  case  for  are  parameters  purely 
dependent  of  the  channel’s  properties  and  that  nothing  is  as¬ 
sumed  about  the  decoding  method.  In  the  derivations  of  Q> 
we  assume  sequential  decoding,  but  by  deriving  them  we  show 
that  they  are  achievable  rates  as  such,  given  the  decoding  pro¬ 
cedure  assumptions. 
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Abstract  -  The  use  of  real  time  channel  estimation 
information  is  known  to  result  in  significant  performance 
advantages  in  coded  systems  operating  on  fading 
channels.  Little  work  has  been  done  in  fast  and  accurate 
channel  signal  to  noise  ratio  estimation  in  a  very  noisy 
channels  (SNR<  0  dB).  In  this  paper,  a  new  real  time 
channel  estimation  technique  using  the  Viterbi  algorithm 
and  Fuzz}7  logic  concepts  is  presented. 

I-  INTRODUCTION 

The  distortion  imposed  by  the  channel  on  the  transmitted 
data  stream  in  a  digital  communication  system  is  normally 
observed  in  the  form  of  errors  at  the  receiver.  The  main 
objectives  of  any  communication  system  are  to  minimise  the 
number  of  these  errors  and  to  maximise  the  throughput  of 
the  system.  In  order  to  optimise  the  system  performance 
adaptively  in  response  to  channel  conditions,  an  estimate  of 
the  receiver’s  error  rate  is  required  to  initiate  control  actions. 

Real  time  channel  estimation  techniques  [1]  are  useful  tools 
for  obtaining  an  on-line  estimate  of  the  channel  state. 
Previous  work  [2]  in  this  area  could  not  accurately  estimate 
channel  SNR  fast  under  very  noisy  condition. 

As  a  by-product  of  the  Viterbi  decoding  algorithm,  the 
cumulative  metric  of  the  most  likely  path  through  the 
decoder  trellis  is  available  as  an  additional  information 
besides  the  decoded  output  symbol.  This  information  may  be 
interpreted  as  a  measure  for  the  signal-to-noise  ratio  (SNR) 
in  the  transmission  channel  [3]  and  consequently  the  error 
probability  of  the  decoded  sequence  could  be  estimated. 

In  the  channel  estimation  scheme  described  here,  the  path 
metric  values  at  the  output  of  the  Viterbi  decoder  are  applied 
to  a  Fuzzy-logic  unit,  which  retrieves  this  information  by 
means  of  post-processing  and  mapping  into  membership 
functions  (MF).  Channel  SNR  estimation  is  made  after  a 
fixed  number  of  decoding  steps. 

II-  THE  FUZZY-LOGIC  UNIT 
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Ulm,  Germany 


computes  the  membership-values  for  each  SNR-membership 
function  in  steps  of  ldB  in  a  range  between  -7dB  and  27dB. 
The  rule  base  consists  of  a  small  look-up  table,  which 
contains  the  mean  values  of  the  input  information  obtained 
during  off-line  training  for  SNRs  between  -10  to  30dB  in 
steps  of  1  dB.  Each  membership  function  is  triangular 
shaped,  where  the  highest  membership-value  is  assigned  to 
the  Fuzzy  input  being  equal  to  the  stored  mean  value  for  the 
£th  dB  step,  with  k  E  {-7,-6,  ...  ,  27}.  After  comparing  the 
input  values  with  the  rule-base,  a  vector  of  membership- 
values  is  obtained,  which  represents  a  fuzzy  description  for 
the  SNR  estimation.  In  order  to  reduce  the  variance  of  the 
channel  SNR  estimate,  the  membership-values  are 
defuzzificated  by  the  Centroid  inference  method  [4,5]. 

III.  SIMULATION  RESULTS 

Simulation  results  have  shown  that  after  receiving  2  kbit  of 
decoded  output  symbols  (i.e.  eight  taps),  the  estimated  value 
for  transmission  Eb/No-  between  0  dB  and  25  dB  can  be 
obtained  with  100%  certainty  with  variance  of  0.25  dB 
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After  a  fixed  number  of  decoding  steps  the  Fuzzy-Logic  unit 
(FLU)  reads  the  transformed  trellis  side-information  and 
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Abstract  —  Fundamental  properties  of  the  ambigu¬ 
ity  function  and  the  uncertainty  relation  of  Fourier 
transforms  assert  a  fundamental  limitation  on  the 
ability  of  any  single  radar  waveform  to  simultane¬ 
ously  resolve  targets  closely  spaced  in  both  time-delay 
and  Doppler-shift.  In  this  paper,  a  method  of  using 
multiple  waveform  sets  to  make  high-resolution  delay- 
Doppler  measurements  is  proposed.  The  fundamen¬ 
tal  theorem  that  supports  this  method  is  established. 
Explicit  optimal  phase,  frequency,  and  joint  phase- 
frequency  coded  waveform  sets  having  constant  am¬ 
plitude  are  presented,  as  well  as  algorithms  for  the 
construction  of  such  sets  of  arbitrary  size. 


I.  Introduction 

A  radar  or  other  pulse-echo  delay-Doppler  measurement  sys¬ 
tem  can  be  viewed  as  an  imaging  system  that  forms  a  delay- 
Doppler  image  of  the  illuminated  environment.  When  a  radar 
system  is  viewed  in  this  way,  it  becomes  clear  that  its  delay- 
Doppler  resolution  is  determined  by  its  imaging  point-spread 
function  or  ambiguity  function  of  the  illuminating  signal  sift ), 
defined  as 


X* 


s(t)s*(t  —  r)e 


j2irutdt. 


We  have  some  control  in  selecting  this  point-spread  function. 
However,  some  fairly  strong  constraints  on  the  mathematical 
form  of  the  ambiguity  function  prohibit  the  ability  to  simul¬ 
taneously  achieve  high-resolution  in  both  delay  and  Doppler. 
For  example,  the  total  volume  under  the  squared  modulus  of 
the  ambiguity  function  of  a  signal  with  energy  E  is  always  E2, 
while  the  peak  of  the  ambiguity  function  always  has  height 
E.  This  is  true  for  any  single  waveform  s(t)  and  cannot  be 
changed  by  any  modulation  scheme. 

One  way  around  this  delay-Doppler  resolution  constraint  is 
to  make  multiple  pulse-echo  measurements  using  waveforms 
having  sufficiently  different  ambiguity  functions  and  then  pro¬ 
cess  and  combine  the  individual  waveform  returns  in  to  form  a 
high-resolution  delay-Doppler  image.  This  leads  to  the  intro¬ 
duction  of  the  composite  ambiguity  function  of  a  set  of  signals. 
A  main  theorem  on  the  composite  ambiguity  function  is  es¬ 
tablished  to  support  the  validity  of  our  idea. 


II.  Main  Theorem  on  Composite  Ambiguity 
Function 

Theorem  1  For  a  set  of  signals  {so(t),  Si(£),  •  •  ■ ,  SK-i(t)} 
with  total  energy 


Et  — 


K  1  /'OO 

E/ 

<= 0 


the  volume  V  under  their  associated  composite  ambiguity  func¬ 
tion  C(r ,  u)  defined  as 


V 


/oo  roc  K _ 1 

•  oo  d  — oo  ■  n 


satisfies 


K 


< 


K—l  K  —  l  |  p  oo 

'"EE  /  si(t)s*j(t)  dt 

i= 0  j= 0 


2 

<  El. 


Furthermore,  the  minimum  is  achieved  when  {so(t)}  S\(t), 
•  *  •  is  a  set  of  equal-energy  orthogonal  signals. 


This  theorem  provides  a  general  rule  of  selecting  signal  set 
for  waveform-diverse  measurements.  However,  a  point-spread 
function  with  a  small  volume  is  not  sufficient  for  obtaining 
high-resolution  image.  It  has  been  shown  [1,  Ch.  3]  quanti¬ 
tatively  that  in  addition  to  small  ambiguous  volume,  an  ideal 
point-spread  function  for  delay-Doppler  radar  imaging  should 
have  a  thumbtack  shape.  This  is  achieved  by  appropriately 
selecting  modulation  schemes  for  the  coded  waveform  set. 


III.  Coded  Waveforms  Design 
We  will  study  only  coded  waveforms  because  the  structural 
constraints  of  these  waveforms  result  in  designs  that  can  be 
easily  implemented  in  real  systems.  Particular  families  of 
waveforms  that  are  investigated  include 

1.  Phase- modulated  signals; 

2.  Frequency- modulated  signals; 

3.  Frequency  and  phase  modulated  signals. 

The  coded  waveform  sets  we’ve  investigated  contain  K  signals 
{s0(t),si(t),  •  •  •  ,s*:-i(t)}  where 


N-l  d 

si(t)  =  Y  -  nry2*-*2- (i) 

n=0 

consists  of  a  sequence  of  N  baseband  pulses  of  length  T  with  fi¬ 
nite  energy.  Each  pulse  is  modulated  by  an  integral  frequency 
modulating  index  di}Tl  and  a  phase  modulating  index  <pi>n  that 
can  take  on  any  real  number. 

The  modulating  patterns  of  a  set  of  coded  waveforms  deter¬ 
mine  the  distribution  of  the  ambiguity  sidelobes  of  its  result¬ 
ing  composite  ambiguity  function.  Phase  modulating  pattern 
controls  the  polarities  of  the  ambiguity  sidelobes  while  fre¬ 
quency  modulating  pattern  determines  their  locations.  The 
examples  that  will  be  shown  demonstrate  that  by  appropri¬ 
ately  selecting  the  phase  modulating  pattern,  it  is  possible 
to  cancel  the  ambiguity  sidelobes,  and  by  selecting  the  fre¬ 
quency  modulating  pattern,  we  can  spread  out  the  ambiguity 
sidelobes  so  that  the  resulting  composite  ambiguity  function 
resembles  a  thumbtack.  The  combination  of  both  phase  and 
frequency  modulations  gives  the  best  result. 

References 

[1]  J.  C.  Guey,  Sequence  and  Waveform  Set  Design  for  Radar  and 
Communication  Systems ,  Ph.D.  Dissertation,  Purdue  Univer¬ 
sity,  West  Lafayette,  IN,  1995. 


drdu 


290 


Reduced  Complexity  Symbol-by-Symbol  Demodulation 

Michael  P.  Fit z1  and  Saul  B.  Gelfand 

School  of  Electrical  &  Computer  Engineering,  Purdue  University, 

West  Lafayette,  IN,  USA  47907-1285 


Abstract  —  Reduced  complexity  symbol-by-symbol 
demodulation  is  examined.  We  examine  the  perfor¬ 
mance  with  standard  complexity  reduction  techniques 
(e.g.,  M-algorithm  and  T-algorithm)  and  then  derive 
a  reduced  state  symbol-by-symbol  demodulation  al¬ 
gorithm  which  makes  symbol-by-symbol  demodula¬ 
tion  performance  and  complexity  competitive  with  se¬ 
quence  estimation. 

I.  Introduction 

Symbol- by-symbol  demodulation  (SYD)  structures,  (e.g.,[l]) 
while  optimum  in  terms  minimizing  symbol  error  probability, 
typically  have  a  complexity  greater  than  sequence  demodula¬ 
tion  (SED)  techniques  (e.g., the  Viterbi  algorithm)  for  a  fixed 
decoding  lag.  Consequently  when  only  hard  decision  outputs 
are  required  SED  techniques  are  invariably  used  in  practice. 
However,  soft  decision  metrics  are  often  needed  (e.g.,  inter¬ 
leaved  or  concatenated  coding  schemes),  and  hence  reduced 
complexity  high  performance  SYD  structures  are  of  inter¬ 
est.  In  this  paper  we  propose  a  new  algorithm  that  produces 
symbol- by-symbol  metrics  at  roughly  the  same  complexity  as 
SED  without  a  significant  loss  in  performance  and  examine 
methods  to  significantly  reduce  the  complexity  of  SYD. 

II.  Overview  of  Optimum  Recursive  Estimation 

Consider  a  modulation  with  memory  described  by  a  time  in¬ 
variant  Markov  chain  transmitting  m  bits  of  information  per 
symbol  corrupted  by  an  AWGN.  Define  K  as  the  decoding  lag, 
<jk  to  be  the  modulation  state,  ||<Tfc||  to  be  the  cardinality  of 
the  modulation  state,  and  w (k)  to  be  all  the  observations  until 
time  k.  We  also  use  Qj  as  the  transmitted  symbol  space  and 

In  (fc)  =  {Ifc  —  n  >  Ifc  —  n+1  >  ■  •  •  ?  h  } 

to  represent  the  last  n+1  transmitted  symbols. 

Assuming  equally  likely  transmitted  symbols,  recursive 
symbol- by-symbol  and  sequence  estimation  algorithms  have 
the  same  three  part  structure:  1)  measurement  update, 
2)  metric  production,  and  3)  sufficient  statistic  update. 
The  measurement  update  takes  the  sufficient  statistics 
from  the  previous  time  iteration  and  the  latest  measure¬ 
ment  and  computes  an  updated  information  state.  From 
this  information  state  the  output  metric  and  the  suffi¬ 
cient  statistic  for  the  current  iteration  can  be  produced. 
The  forward  recursion  optimum  SYD  has  sufficient  statis¬ 
tics  p  (ak  | w  (k  -  1) )  (the  posterior  probability  mass  function 
(pmf)  of  the  modulation  state)  and  p  (Ifc— i  \ak ,  w  (k  —  1) ) ,  i  — 
15  K  (the  conditional  posterior  pmfs  of  the  transmit¬ 
ted  symbols).  Similarly  the  sufficient  statistics  for  opti¬ 
mum  SED  are  max  p  (Ik-i  (k  -  1)  ,  ak  |w  (fc  ”  1) ) 

(the  largest  posterior  pmf  for  each  modulation  state) 
and  arg  max  p  (I*r-i  (k  —  l)  ,ak  |w  (k  —  1) )  (the  se- 
Ijc-i  (fc-Ueflj’ 

quence  that  achieves  the  maximum).  It  should  be  noted 


that  SED  can  operate  on  log-likelihood  functions  while  SYD 
cannot,  but  the  exponential  function  evaluation  needed  in 
the  measurement  update  for  SYD  could  easily  be  done 
with  a  lookup  table.  The  complexity  of  optimum  SYD  is 
O  ( KM 2  ||<7fc||)  where  M  =  2m  and  the  complexity  of  SED 
isO(M|M). 

III.  Complexity  Reduction  Techniques 

Often  in  practice  the  complexity  of  an  optimal  demodulator  is 
prohibitive  and  reduced  complexity  demodulation  techniques 
need  to  be  used.  Since  the  structure  of  SYD  is  so  similar  to 
SED  the  best  complexity  reduction  techniques  are  also  similar. 
Two  of  the  most  applicable  techniques  are  the  M-algorithm  [2] 
which  saves  the  M  most  likely  sequences  and  the  T-algorithm 
[3]  which  saves  and  processes  only  the  statistics  or  posterior 
likelihood  values  which  break  a  threshold  each  iteration.  The 
T-algorithm  version  of  the  SYD  provides  the  best  average 
complexity  performance  tradeoff  and  the  threshold  for  this 
algorithm  can  be  chosen  in  a  principled  fashion.  Conversely, 
the  T-algorithm  version  of  the  SYD  has  the  disadvantage  of 
having  variable  complexity  and  memory  requirements. 

Additionally  a  reduced  state  SYD  algorithm  analogous  to 
the  RSSE  [4]  can  be  derived  using  the  approximation: 

Al:  {Ik- i  (fc  -  1) ,  ak}  is  deterministic  given 

{IK~ i  (fc  -  1) ,  ak}  €  ak  and  w  (k  -  1) 

where  ak  is  the  reduced  state  partition.  The  recursion  result¬ 
ing  from  (Al)  has  a  similar  form  as  the  optimal  algorithms 
and  the  sufficient  statistic  is  p  (ak  |w  (k  —  1) )  (the  posterior 
pmf  of  the  state  partition)  and  I k-i  (dfc,  w  (fc  —  1))  ,  i  =  15  K 
(the  conditional  decisions).  The  complexity  of  this  reduced 
state  demodulator  is  O  (KM  ||dfc||).  For  medium  to  high  SNR 
and  when  ak  =  ak,  this  reduced  state  SYD  has  roughly  the 
same  complexity  as  SED  and  produces  performance  almost 
indistinguishable  from  the  optimum  estimator. 

IV.  Conclusions 

The  combination  of  the  reduced  state  symbol-by- symbol  de¬ 
modulation  and  the  T-algorithm  provides  a  demodulation  al¬ 
gorithm  that  maximizes  average  performance  versus  compu¬ 
tational  complexity  while  still  maintaining  a  reasonable  max¬ 
imum  complexity  and  memory  requirement. 
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Abstract  -  We  propose  a  new  representation  method  for 
multicomponent  chirp  signals.  This  representation  is  based  on 
the  2-D  frequency-shear  plane.  Analytical  results  for  the  chirp 
signals  are  presented. 

i.  Introduction 

The  conventional  tools  for  analysis  ol  this  class  signal  are  time- 
frequency  distributions  (TFD)  which  can  be  interpreted  as  a 
smoothed  version  of  the  Wigner  distribution  (WD)  of  signal  to  be 
analyzed  [1].  Much  current  efforts  have  put  on  designing  nice 
kernel  functions  to  achieve  better  performance  in  suppressing  cross 
terms  while  retaining  high  time  and  frequency  resolutions  of  auto 
terms.  However,  those  kernel  functions  are  based  on  rectangular 
tessellaion  of  the  time-frequency  plane  and  therefore  they  may  not 
suit  for  representing  a  certain  types  of  signals  such  as  chirping 
signals,  hi  this  paper,  we  introduce  a  frequency-shear  distribution 
that  maps  a  signal  onto  frequency-shear  plane.  Hie  properties  of 
this  distribution  are  investigated  and  analytical  results  are 
presented. 

II.  The  Frequency-Shear  Distribution 

Let  us  define  a  transform  of  signal  x(t)  as  follows 

Qa(y,q)=]X(Og(t)eM^,‘>c/l  (!) 

where,  U  and  denote  frequency  and  shear,  respectively.  g(t) 
is  a  weighted  window  function.  The  frequency-shear  distribution 
(FSD)  is  defined  as  the  squared  magnitude  of  Qfv.q): 


=  2-r  i  hVx(f’caWgO,a>-  v-qt)dtda 


where,  tV(t,co)  is  the  so-called  Wigner  distribution.  From  the 
definition,  we  know  that  the  time-frequency  function  of  the  signal 
is  weighted  with  a  chirplet  function,  which  corresponds  to  the 
local  structure  of  the  chirp  signal.  The  weighting  function  has  an 
oblique  analysis  cell  on  the  time-frequency  plane  that  is  suitable 
for  analyzing  multicomponent  chirp  signals. 

in.  The  Representation  of  Multicomponent 
Chirp  Signals 

In  this  section,  we  will  consider  several  typical  signals  and 
analytically  calculate  their  FSDs. 

(1)  Single  chirp  signal 

A  linear  chirp  signal  with  constant  magnitude  has  the  following 
WD  \y  a,)  -  A22ttS (co  -coQ-yf) 
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which  is  highly  concentrated  about  the  chirp’s  linear  instantaneous 
frequency.  In  the  FSD,  we  chose  a  Gaussian  signal  as  the 
weighted  function.  The  FSD  of  this  chirp  signal  is  calculated  and 
given  by 

©  (v,q)  =  ,2^  if  cr»  1  (3) 

\a(q-r) 


The  chirp  signal  is  located  at  the  point  (co0,y)  on  the  frequency- 
shear  plane  as  expected. 

(2)  The  signal  of  two  chirp  components 

Assuming  that  the  signal  is  consisting  of  the  sum  of  two  chirp 
signals 

/,\  4  ./'(“V+T-P)  . 

*2(0  =  4*  +  A2e  2 

we  consider  a  particular  case  that  yx  =  y,=y  and  give  the  WD  of 
the  signal  for  comparison 

IV  2  (t,  co)  =  A\  27rd(o)  -  o),  -  yt) +  /f22  2nd  (co  -  *>2  -  yt)  ^ 

+  4  tl\Az  cos ((co2  -  co ,  )t)s  (co-j(cot+co2)~yt) 

There  are  two  auto  terms  centred  at  co~o\,co^  and  a  cross  term 
whose  peak  locates  on  the  straight  line  co  =  {(co,  +co2)-yt  .  The 

FSD  of  the  same  signal  is  given  by 

A  A  r~  (  ( ) 2  {v-fih)2  ^ 

«'"w  1 

W<7-r)lt 


when  cr»  1  (5) 


K q-ri  v  q-y 


Wi-y)  |  V  q-y  ) 

where  coz  =  \ (cox  +  co2),  a)A=co2-cor  The  cross  term  at  v-coL 
reduces  to 

@  rvc,\  -  .  The  magnitude  of  the  cross  term  is 

"  ’  k(?-r) I 

largely  suppressed  if  compared  to  that  of  the  WD. 

The  analytical  results  given  in  this  paper  have  shown  that  the 
new  frequency-shear  distribution  provides  a  more  effective  tool 
for  analyzing  multicomponent  chirp  signals  than  the  generalized 
time-frequency  distribution.  It  can  suppress  the  cross  terms  and 
clearly  locate  the  signal  components  onto  the  frequency-shear 
plane.  In  fact,  this  advantage  is  caused  by  introducing  a  chirplet 
function  that  corresponds  to  the  structure  of  a  chirp  signal. 
Similarly,  signal  representation  can  be  extended  to  scale-shear  and 
shift-shear  plane  using  so-called  fan  bases  and  chevron  bases  [2] 
whose  elements  scale  or  translate  and  shear  in  the  time-frequency 
plane. 
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Abstract  —  We  present  a  new  paradigm  for  the  de¬ 
centralized  detection  problem  under  communication 
constraints.  In  this  problem,  local  sensors  send  a  hard 
decision  or  the  likelihood  ratio  itself  to  the  fusion  cen¬ 
ter  based  on  the  specified  communication  constraint. 
Optimal  system  is  designed  by  minimizing  the  risk 
function.  Also,  a  simpler  system  design  procedure 
based  on  Ali-Silvey  distances  is  presented. 

In  this  paper,  we  present  a  novel  paradigm  for  the  decen¬ 
tralized  detection  problem  under  communication  constraints. 
The  proposed  approach  is  flexible  and  combines  the  features 
of  both  centralized  and  hard  decision  decentralized  detection 
problems.  Under  specified  constraints,  we  design  the  opti¬ 
mum  decentralized  detection  scheme.  The  system  can  oper¬ 
ate  at  the  two  extremes,  i.e.,  it  can  be  a  centralized  system  or 
a  hard  decision  decentralized  detection  system,  or  anywhere 
in-between.  In  this  scheme,  local  sensors  send  a  hard  decision 
to  the  fusion  center  when  the  local  sensors  have  a  relatively 
high  confidence  in  the  decision,  otherwise  a  perfect  version  of 
the  LLR  (in  practice,  a  finely  quantized  version  of  the  LLR)  is 
sent.  The  degree  of  confidence  at  which  this  switch  is  made  is 
determined  by  the  specified  communication  constraint.  The 
fusion  center  makes  a  final  decision  based  on  the  received  in¬ 
formation  from  local  sensors. 

Observation  samples  at  the  local  sensors  are  denoted  by  rj, 
i  =  1,  •  •  • ,  M,  and  their  joint  conditional  densities  are  assumed 
known.  Based  on  its  own  observation  rj,  each  local  sensor 
makes  a  local  decision  u,  6  {0,  1,  2),  i  =  1,***jM,  where 
m  =  0  and  u,  =  1  represent  the  fact  that  the  ith  local  sensor 
decides  hypotheses  Ho  and  Hi  and  correspondingly  sends  a 
zero  and  a  one  to  the  fusion  center,  u,  =  2  indicates  that 
the  ith  local  sensor  computes  and  sends  its  LLR  Li  to  the 
fusion  center.  Let  represent  the  output  of  the  sensor  i, 
i.e.,  UFi  ~  when  tii=0  or  1;  uFi  —  Li  when  u,=2.  Local 
sensor  outputs  are  transmitted  to  the  fusion  center  where  a 
global  decision  is  made  based  on  the  received  data  vector, 
upT=[af|  uf3  ■■■  «**,]• 

The  probability  that  i,( rj)  is  transmitted  from  the  sensor 
i  is  employed  as  a  measure  of  the  transmission  rate  on  the 
channel  i.  We  define 

Ri  —  p(send  Li)  —  1  —  p(send  Ho  or  Hi).  (1) 

Note  that  R,  =  l,  i  =  1,  *  •  • ,  M,  represents  the  centralized  case, 
and  Ri= 0,  i  =  1,  *  Af,  represents  the  case  that  hard  deci¬ 

sions  are  made  at  the  local  sensors[l,  2].  We  are  interested 
in  examining  the  flexible  hybrid  decision  scheme  in  the  decen¬ 
tralized  detection  system  with  a  lower  average  communication 
rate  (as  compared  to  the  centralized  detection  problem)  on  the 
channels  linking  local  sensors  to  the  fusion  center. 

Design  of  a  decentralized  detection  system  involves  specify¬ 
ing  both  the  local  decision  rules  and  the  global  decision  rule. 

0  Research  sponsored  by  Air  Force  Office  of  Scientific  Research, 
Air  Force  Systems  Command,  USAF,  under  Grant  No.  F49620-94- 
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By  employing  the  person-by-person  optimization  methodol- 
ogy,  the  system  is  designed  so  as  to  minimize  the  risk  function. 
The  system  is  specified  by 

•  Optimal  local  decision  rule  at  sensor  k,  k  =  1,  *  •  • ,  m: 


Uk 


(2) 


•  Optimal  fusion  rule: 

uo  =  1 

p(*'Fm  >  Cf 

p(u*f\H0)  <  Cd 

uq  =  0 


(3) 


where  u*F  is  the  one  of  the  3M  possible  combinations  of 
«F. 

Motivated  by  the  difficulty  and  excessive  computational  re¬ 
quirements  of  the  above  PBPO  system  design,  a  simplified  de- 
sign  procedure  based  on  the  class  of  Ali-Silvey  distance  mea¬ 
sures  is  also  presented.  Following  the  lead  of  [3,  4],  we  obtain 
local  decision  rules  that  maximize  the  Ali-Silvey  distances  be¬ 
tween  the  conditional  densities  at  the  input  of  the  fusion  cen¬ 
ter. 

It  should  be  noted  that  both  system  designs  are  obtained 
under  communication  constraints  given  in  Equation  (1).  An 
example  is  considered  for  this  flexible  hybrid  decision  scheme 
for  the  decentralized  detection  problem.  Results  show  that 
the  system  performance  of  the  proposed  scheme  with  lower 
average  communication  rate  is  fairly  close  to  the  performance 
of  the  centralized  system. 
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Abstract  —  The  problem  of  change  detection  is  con¬ 
sidered  in  a  decentralized  setting.  A  Bayesian  frame¬ 
work  is  introduced  for  this  problem,  and  an  optimal 
solution  is  obtained  for  the  case  when  the  information 
structure  in  the  system  is  quasiclassical. 

I.  Problem  Formulation 

The  centralized  version  of  the  change  detection  problem — 
where  all  the  information  about  the  change  is  available  at  a 
single  location — is  well-understood  and  has  been  solved  under 
a  variety  of  criteria  since  the  seminal  work  by  Page  [1],  How¬ 
ever,  there  are  situations  where  the  information  available  for 
decision-making  is  decentralized,  an  example  being  link  fail¬ 
ure  detection  in  a  large  communication  networks.  We  focus 
on  this  decentralized  setting. 

Consider  a  system  with  N  sensors  Si,...,5jv.  At  time 
k  £  {1,2,...},  sensor  Si  observes  a  random  variable 
and  forms  a  message  (belonging  to  a  finite  set)  based 
on  the  information  it  has  at  time  k.  Assume  that  two-way 
communication  is  possible  between  the  sensors  and  the  fusion 
center.  In  particular,  at  time  k  the  fusion  center  broadcasts  to 
each  sensor,  all  the  sensor  messages  it  received  at  time  k  —  1. 
This  means  that  at  time  k,  each  sensor  has  access  to  all  its 
observations  up  to  time  k  and  all  the  messages  of  all  the  other 
sensors  up  to  time  k  —  1,  and  the  fusion  center  has  access  to  all 
the  sensor  messages  up  to  time  k.  Based  on  the  sequence  of 
sensor  messages,  a  decision  about  the  abrupt  change  is  made 
at  the  fusion  center. 

We  take  the  approach  of  Shiryayev  [2]  and  assume  that  the 
change  time  T  is  geometric  distributed,  i.e., 

p(r  =  o)  =  i/  and  P(r  =  i\r  >  o)  =  P( i  -  Py 

Further,  we  assume  that  observations  at  each  sensor  Si  are 
independent,  have  a  common  pdf  before  the  disruption, 
and  common  pdf  f[1^  from  the  time  of  disruption.  We  also 
assume  that  the  observations  are  independent  from  sensor  to 
sensor. 

As  in  [4],  we  restrict  the  local  memory  at  sensor  Si  to  only 
past  messages.  The  resulting  information  structure  is  said 
to  be  quasi- classical  [3]  and  it  makes  the  joint  optimization 
problem  tractable  via  DP  arguments.  At  any  time  k ,  the  one- 
step  delayed  information  is  the  same  for  all  members  and  is 
given  by  4- a  =  {iftE,, ,  Ufa >  , . . . ,  £©_,]}■ 

With  this  understanding,  the  sensor  function  at  Si  at  time 
k  can  be  regarded  as  a  quantizer  of  the  observation  that 
depends  on  Ik-i,  i.e.,  =  f'(k]Ik_1{Xik)).  The  message 

U*jp  is  assumed  to  take  some  value  (say,  di)  in  the  finite  set 
{1,  Further,  we  use  the  notation  <j>k ,  d  and  Uk  to 

denote  the  corresponding  A-dimensional  vectors. 

The  fusion  center  policy  ^  consists  of  selecting  a  stopping 
time  r  at  which  it  is  decided  that  the  disruption  has  oc¬ 
curred.  In  a  Bayesian  formulation,  the  goal  is  to  minimize 
a  linear  combination  of  the  cost  associated  with  incorrect  de¬ 
cision  (“false  alarm”)  and  the  cost  associated  with  the  de¬ 
lay  in  detecting  the  disruption  under  the  assumption  that  the 


“alarm”  signal  is  correctly  given.  This  leads  to  the  following 
optimization  problem. 

Problem  (P):  Minimize  E  [l{r<r>  +  c(r  -  r)l{r>r}]  over 

all  admissible  choices  of  tj)  and  <$\l  =  1, . . . ,  N,  k  =  1,2,..., 
where  the  constant  c  >  0  is  the  cost  of  each  unit  of  delay. 


II.  Results 

The  solution  to  (P)  is  obtained  using  dynamic  programming 
(DP)  arguments.  A  sufficient  statistic  at  time  k  for  the  DP 
recursions  is  the  posterior  probability  of  the  change  having 
happened  before  time  k  given  Ik ,  i.e.,  pk  =  P(r  <  &|J*).  This 
one-dimensional  sufficient  statistic  is  all  that  the  sensors  and 
fusion  center  need  to  store  at  any  given  time  k,  and  it  can  be 
easily  updated  using  the  recursion  given  below  in  (1).  The 
complete  solution  to  (P)  is  stated  below. 


Theorem  1  (i)  The  optimum  fusion  center  policy  is  to  stop 
and  declare  that  a  change  has  occurred  at  the  first  k ,  such 
that  pk  >  a,  where  a  is  the  solution  to  c  +  Aj(a)  =  1  —  a. 
(ii)  At  each  time  k,  it  is  optimum  for  the  sensors  to  use 
monotone  likelihood  ratio  quantizers  [4]  whose  thresholds  de¬ 
pend  onpk.  Furthermore,  a  stationary  set  of  sensor  functions 
is  optimal,  and  this  set  is  given  by 

4>*Pk  -  argmin  Wj(4>\pk) 

<f> 


where  the  function  J  is  the  unique  solution  to 

J(p)  =  min  {(1  -p),c  +  Aj(p)}  ,  for  all  p  6  [0, 1], 

and 

Aj(p)  =  mmWj(<j>-,p), 

<P 

a 

g{d;<j>;p)  =  [p  +  (l  -p)p]  9^,(1)  (4)  •  •  • 

/(<*;  <t>\p)  =  g(d; <t>;p )  +  (1  -  p)(l  -  p)  ?£(i, (4)  •  ■  ■  q°^N) (dN) 
and 

<(0  w  =  Pfp  (*(,,(*(0)  =  *)  • 

Finally,  the  recursion  for  pk  is  given  by 

g(Uk+i;<t>*k;pk) 


Pk+ 1  = 


f(Uk+1-,<j,*;PkY 


Po  =  V 


(1) 
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Abstract  -  The  loss  associated  with  a  distributed  signal  detection 
system  as  compared  to  a  centralized  scheme  is  evaluated  with 
respect  to  probability  of  error.  Such  a  loss  is  numerically 
computed  for  several  members  of  the  exponential  family. 


I.  INTRODUCTION 

An  important  problem  in  a  Distributed  Signal  Detection 
(DSD)  scheme  is  the  loss  associated  with  the  system.  Hence, 
error  analysis  plays  a  significant  role  in  the  design  of  DSD 
processors.  Here,  we  make  an  attempt  to  quantify  the  loss 
associated  with  a  DSD  system  as  compared  to  a  centralized 
scheme  by  providing  an  easily  computable  probability  of  error 
expression. 

Consider  a  network  of  n  distributed  sensor 
communicating  with  a  fusion  center.  Let  }U  yU  . ,£/ 

L  1  2  k 

represent  the  quantized  data  passed  from  the  sensors  numbered  1 
through  k  to  the  fusion  center.  Let  1  X  ,X  . . X  | 

L  A  +  1  k  +  2  nJ 


represent  the  observations  at  the  remaining  sensors,  which  are 
passed  directly  on  to  the  fusion  center  without  any  quantization. 
Let  us  assume  that  U/s,  i  =  1,2,...., k  are  binary  valued  and 

i 

that  the  problem  is  to  decide  between  two  hypotheses  HQ  and 
H1 .  Denoting  the  density  of  the  ith  sensor  as  /  (  j  H  j  ) , 

j  =  0,1,  and  assuming  that  sensor  observations  given  the 
hypothesis  are  independent  and  identical,  we  can  formulate  an 
optimum  fusion  center  test  based  on  a  Likelihood  Ratio  Test 
(LRT)  [1].  The  LRT  is  given  by  the  following 

^  tfc  (1) 

"o 


where 


/<**+ 1- 

Hx) 

/(**+!  ’• 

**’ xn  *o> 

,  and  Dk  = 


P{UX„ 

-uk 

P(.U 

..,uk  ff0) 

and  tk  is  an  appropriate  threshold. 


(2) 


H.  AVERAGE  PROBABILITY  OF  ERROR 
The  average  probability  of  error  corresponding  to  (1) 
can  be  written  as 


*This  work  was  supported  by  BMDIO/IST  and  managed  by  the 
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Pe(k)=P(H0)P 


(  ,  > 

(  t  \ 

\ck  >  — |fl0 

+  P("i)P 

ck  <-^1^ 

l  Dk  J 

l  Dk  ) 

(3) 


In  many  problems  of  practical  interest,  sufficient  statistics  of 
fixed  low  dimensions  exist.  Hence,  the  probability  sets  involving 
the  Ck  in  (3)  can  be  replaced  by  appropriate  sets  involving  the 
sufficient  statistic.  Moreover,  the  Dk  in  (3)  can  only  take 
discrete  number  of  values,  a  maximum  of  k  4-  1  different  values. 
These  possible  values  ar  effy  \  j  =  0,1,...,  k ,  where 


P(Uj  -lK) 
P(Ut  =l|//0)’ 


and  S  - 


P{Vj  aQ^i) 

PiVi  =o|//0) 


(4) 


Therefore,  the  probabilities  of  the  type  (3)  can  be  very  easily 
computed  as  a  function  of  k  .  Such  computations  are  carried  out 
for  the  case  when  the  density  of  observation  belongs  to  an 
exponential  family. 

JH.  PERFORMANCE  ANALYSIS 
Closed  form  error  expressions  for  gamma,  exponential 
(for  testing  scale  parameter)  and  normal  (for  testing  location 
parameter)  densities  are  derived.  Table  1  shows  the  ratio  of  the 
error  probabilities  when  tl  =  5  and  Signal  power  to  Noise  power 
Ratio  (SNR)  is  10  dB.  a  is  the  shape  parameter  of  the  gamma 
density.  As  a  increases  the  ratio  of  the  error  probabilities  also 
increases. 


Exponential 

Normal 

Gamma  ,  a  =  3 

P  (2) 

1.2 

1.6 

2.0 

P  (4) 

e 

P  (1) 

e 

1.8 

4.4 

8.0 

Table  1 


Numerical  results  indicate  that  for  normal  and  gamma 
(with  large  a )  densities  the  loss  due  to  quantization  is  more 
significant  than  for  exponential  density. 
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Abstract  -  Two  pdf  models  suitable  for  describing  non- 
Gaussian  iid  noise  are  introduced.  The  models  are  used  in  the 
design  of  a  LOD  test  for  detecting  weak  signals  in  real  non- 
Gaussian  noise.  Results  obtained  in  the  context  of  an 
underwater  acoustic  application  are  encouraging. 

1.  INTRODUCTION1 

Conventional  signal  processing  and  detection  criteria,  optimised 
in  presence  of  Gaussian  noise,  may  decay  in  non-Gaussian 
environments.  Higher  Order  Statistics  (HOS)  |1|  is  a  powerful 
means  to  analyse  non-Gaussian  noise  and  build  robust  detectors. 
This  work  focuses  attention  on  the  problem  at  optimizing  detection 
in  presence  of  additive ,  iid,  stationary,  non-Gaussian  noise  under 
the  conditions  of  weak  signals  (i.e.,  for  low  Signal-to-Noisc  Ratio  - 
SNR).  Li  order  to  optimize  the  Probability  of  Detection  P  j  f  for 
low  SNR  values,  the  selected  binary  statistical  test  consists  in  a 
Locally  Optimum  Detector  (LOD)  |2|,  whose  test  rule  is  computed 
on  the  basis  of  new  models  of  noise  univariate  probability  density 
function  (pdf)  [3|.  The  investigated  models  are  expressed  in  terms 
of  the  HOS  parameters  skewness  (of  the  3rd -order)  which 
quantifies  the  deviation  from  shape  symmetry,  and  kurtosis  (of  the 
4th  order)  which  quantifies  the  sharpness  of  a  shape.  The  detector 
has  been  tested  in  the  case  of  deterministic  signals  corrupted  by 
real  shipping-traffic  noise,  acquired  during  a  sea  campaign,  in  the 
context  of  CEC  MAST-I  SNECOW  project  (May  1993)  [41. 

2.  DESCRIPTION  OF  THE  APPROACH 
The  proposed  method  is  based  on  the  statistical  analysis  of 
channel  noise.  As  LOD  requires  the  analytical  model  of  noise  pdf, 
attention  is  focused  on  this  aspect.  The  first  model  is  a  generic 
pdf  introduced  by  Champemowne  and  used  in  [3|.  It  can  be 
applied  if  the  N  noise  components  have  an  hyperbolic  distribution 
of  power.  In  this  acoustic  application,  in  which  noise  main 
components  are  the  ship,  from  which  the  sensor  was  dropped 
(strong  source),  and  the  surrounding  traffic  ships  (which  can  be 
considered  equally  distributed  on  the  sea,  and  contribute  weakly 
to  noise),  this  pdf  model  is  reasonable.  It  depends  on  |J->,  the  ratio 
between  the  4th  and  the  square  of  the  2nd  moments  |3|.  A  second 
new  model  is  presented,  the  "asymmetric  Gaussian"  pdf. 
consisting  of  two  Gaussian  parts,  and  depending  on  two  second- 
order  parameters  (deriving  from  the  definition  of  variance),  i.e., 
the  "left  and  right  variances",  which  together  maintain  the  same 
information  provided  by  the  skewness.  The  non-linear  function 
gla('),  in  terms  of  which  the  likelihood-ratio  of  the  LOD  rule  is 
expressed  [2J,  is  easily  expressed  in  terms  of  these  two  models. 
Information  added  by  HOS-based  description  is  contained  in 
simple  parameters  ((3->  or  a ^  and  a,.),  and  no  constraint  has  to  be 
satisfied  about  signal  characteristics. 
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Community  in  the  context  of  MAST-I  SNECOW  Project 


3.  Experimental  results  and  future  work 

An  extensive  test  phase  was  carried  out.  Noise  was  acquired  in  a 
coastal  shallow-water  area.  The  presence  of  a  lot  of  traffic  and  of 
reflection  and  refraction  makes  the  detector  work  in  critical 
conditions.  The  LOD  performances  are  summarized  in  Fig.  1  in 
terms  of  P^  vs.  SNR.  A  comparison  among  the  results  of  the  two 
proposed  pdfs  and  the  Gaussian  model  is  presented. 

The  tests  were  carried  out  by  fixing  the  Probability  of  False  Alarm 
Pp^  =  o=5(/(..  Non  Gaussian  real  underwater  acoustic  ship-traffic 
noise  was  characterized  by  ju=0,  (3 7=2.84,  G/=  1860,  ar=1500. 
The  proposed  models  appear  approximately  equivalent,  as  noise 
presents  deviation  from  both  Gaussian  sharpness  and  symmetry. 
The  next  investigation  step,  concerning  the  model  of  propagation 
through  a  real  shallow-water  channel,  is  going  to  be  carried  out. 


SNR  (dB) 


(c) 


Fig.  1  Results  of  the  LO  detector  under  the  Champemowne  (a),  the 
asymmetric-Gaussian  ( h)  and  the  Gaussian  (c)  hypotheses. 
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Abstract  — 

This  paper  handles  the  detection  of  Gaussian  sig¬ 
nals  in  compound-Gaussian  noise.  We  show  that  the 
optimum  detector  is  the  conventional  one  plus  an  esti¬ 
mator  of  the  short-time  noise  power  spectral  density. 

I.  Introduction 

In  this  paper  we  consider  the  problem  of  detecting  one  out 
of  M  Gaussian  processes  with  known  autocorrelation  func¬ 
tions  (acf’s)  in  the  presence  of  non-Gaussian  noise:  such  a 
problem  is  commonly  encountered  in  radio  communications 
over  fading  dispersive  channels  subject  to  atmospheric  noise. 
Denoting  by  cti(f),  a2(t),  ■  • .  M  complex  Gaussian  ran¬ 

dom  processes  with  given  acf’s,  the  detection  problem  under 
study  amounts  to  the  following  M- ary  hypothesis  test: 

Hi  :  r(t)  —  ai(t)  +  c(t)  (1) 

wherein  r(t)  and  c(t)  denote  the  complex  envelopes  of  the  re¬ 
ceived  signal  and  of  the  impinging  noise,  respectively.  Such 
a  noise  is  modeled  as  a  compound-Gaussian  process,  namely 
as  the  product  of  a  real,  non-negative  component,  s(t)  say, 
times  an  independent,  Gaussian,  possibly  complex  process, 
g(t ).  Theoretical  considerations,  supported  by  experimental 
results,  show  that,  if  the  correlation  time  of  s(t)  is  much  s- 
maller  than  that  of  g(t ),  then  the  model  represents  a  faithful 
description  of  some  important  noise  sources,  such  as  atmo¬ 
spheric  noise  and  scattering  from  composite  surfaces  (see  [1] 
and  references  thereof).  Since  the  signalling  interval  is  typ¬ 
ically  much  smaller  than  the  average  decorrelation  time  of 
s(t),  the  modulating  process  degenerates  into  a  random  con¬ 
stant  and  the  noise  process  reduces  to  a  Spherically  Invariant 
Random  Process  (SIRP). 


PSD  is  substituted  by  an  estimate  of  the  short-term  PSD  (i.e., 
of  the  conditional  noise  PSD  given  s).  This  fact  does  not  entail 
that  the  conventional  detector  is  optimum  under  SIRP  dis¬ 
turbance,  since  the  estimator-correlator  is  to  be  keyed  to  the 
estimated  short-term  noise  PSD.  In  any  case,  we  stress  here 
that  the  receiver  is  canonical,  in  the  sense  that  its  structure 
is  one  and  the  same,  independent  of  the  probability  density 
function  (pdf)  of  the  modulating  process  and,  hence,  of  the 
statistics  of  the  noise  process. 

So  far,  the  structure  of  the  estimator  s2N  has  been  left  aside: 
interestingly,  it  can  be  shown  to  coincide  with  the  average  of 
the  square  modula  of  the  projections  of  the  first  N  versors  of 
the  received  signal  along  any  orthonormal  basis  of  the  space 
L2(0,T);  as  N  — *  oo,  s 2N  can  be  shown  to  converge  in  the 
mean  square  sense  to  the  random  variable  s2 . 

Choosing  the  complex  exponentials  of  period  T  as  a  basis 
yields 


s 


2= 

N  2Af0  NT 


EKt) 


(») 


where  Rr(f)  is  the  Fourier  Transform  of  the  received  signal, 
as  observed  in  the  interval  (0,T):  thus,  s2N  is  an  average  of 
the  sampled  periodogram  of  the  received  signal. 

Summing  up,  the  minimum  error-probability  decision  rule 
for  equally  likely  signals  is  written  as 


decide  H 


» :  f r(t)a*  ( t)dt  —  bt  >  fr(t)a*k  ( t)dt  —  bk  Vk  ^  i  (4) 
Jo  Jo 


wherein  ak(t)  is  the  linear  minimum  mean-square  estimation 
of  the  k— th  signal  in  Gaussian  noise  with  PSD  2Afos2  and 
bi  =  bi(s2)  are  proper  bias  terms,  depending  on  the  value  of 
the  noise  short-term  PSD. 


II.  Receiver  Design 

We  focus  on  the  case  of  uncorrelated  noise  observations 
with  Power  Spectral  Density  (PSD)  2A/oE[s2]  (where  2A/o  is 
the  PSD  of  the  Gaussian  component  and  E[-]  denotes  statis¬ 
tical  expectation),  since,  due  to  the  closure  of  both  Gaussian 
processes  and  SIRP’s  with  respect  to  linear  transformations, 
the  case  of  correlated  noise  can  be  easily  handled  via  whiten¬ 
ing  approach. 

Denoting  by  A5[r(<);  2A/o \H(]  the  likelihood  functional  un¬ 
der  hypothesis  Ht  for  complex,  uncorrelated  Gaussian  noise 
with  PSD  2A/o,  the  likelihood  functionals  in  the  presence  of 
SIRP  can  be  shown  to  assume  the  form 

A[r(t)\Ht]  —  Ag[r(t);2Af0  lim  (2) 

N  —  oo 

wherein  s2N  represents  a  consistent  estimator  of  the  random 
variable  s2  and  can  be  computed  by  properly  processing  the 
observables.  Otherwise  stated,  since  the  noise  process,  as  ob¬ 
served  in  sufficiently  short  time  intervals,  is  a  conditionally 
Gaussian  random  process,  then  the  likelihood  functionals  co¬ 
incide  with  those  for  Gaussian  noise,  provided  that  the  noise 


III.  Performance  analysis 

As  to  the  performance  of  this  detector,  the  analysis  of  On- 
Off  Keying  (OOK)  signalling  with  exponential  correlation  sub¬ 
ject  to  noise  with  Laplacian  pdf  demonstrates  that  the  er¬ 
ror  probability  depends  on  two  parameters,  the  ratio  of  the 
received  energy  to  the  noise  long-term  PSD  and  the  time- 
bandwidth  product  of  the  signal,  namely  the  product  of  the 
correlation  length  times  the  spectral  width  of  the  Gaussian 
random  process.  Interestingly,  as  for  the  case  of  Gaussian 
noise,  the  larger  such  a  product,  the  better  the  performance. 
Additionally,  the  noise  spiky  ness  seems  not  to  dramatically 
affect  the  performance,  even  though,  as  for  the  case  of  non- 
dispersive  channels,  increased  noise  spykiness  results  in  worse 
and  worse  performance,  especially  in  the  interest  region  of 
extremely  low  error  probabilities. 
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Abstract  —  While  locally  optimum  detection  requires 
complete  knowledge  of  the  noise  density,  we  use  only  the 
first  few  absolute  moments  of  the  independent,  identically 
distributed  (iid)  noise  to  obtain  a  robust  detector  that  is 
locally  optimum  for  the  least  favorable  noise  satisfying 
these  moments.  This  robust  detector’s  efficacy  approaches 
that  of  the  asymptotically  optimum  detector,  while  requiring 
limited  knowledge  of  the  noise  statistics. 

I.  SIGNAL  AND  NOISE  MODEL 

The  problem  is  modeled  as  deciding  between  the  null 
hypothesis  X  =  W  and  the  alternative  hypothesis 
X-0S  +  W  where  X  is  an  n  -element  observation  vector, 
W  is  a  vector  of  zero-mean  iid  noise  random  variables  with 
univariate  density  / ,  s  is  a  vector  of  known  signal  samples 
with  nonzero,  finite  asymptotic  average  power,  and 
0  =  K/ 4n  ,  for  some  unknown  K  >  0 . 

II.  COMPLETELY  KNOWN  NOISE  STATISTICS 

The  locally  optimum  (LO)  detector  of  a  known  signal  in  iid 
noise  is  a  memoryless  nonlinearity  followed  by  a  correlator 
[1].  The  memory  less  nonlinearity  g  depends  on  the  noise 
density  by  g(x)  = -/'(*)/ f(x),  where  f'(x)  =  df(x)/dx. 
When  the  noise  is  zero-mean  Gaussian  with  unit  variance, 
g(x)  =  jc  and  the  LO  detector  is  a  linear  correlator.  A 
generalization  of  the  LO  detector  is  a  nonlinear  correlator 
where  g  is  any  function  satisfying  mild  regularity 
conditions.  A  common  example  is  the  sign  correlator,  whose 
nonlinearity  is  the  signum  function. 

Efficacy  T](g,f)  is  an  asymptotic  measure  for  predicting 
detection  performance.  In  the  asymptotic  case,  n  — > « 
which  implies  6  —» 0 .  The  asymptotic  LO  detector  is 
equivalent  to  the  asymptotically  optimum  (AO)  Neyman- 
Pearson  detector,  and  its  efficacy  is  equal  to  Fisher 
information  /(/).  r}(g,f)  is  concave  in  g  and  convex  in  / 
and  satisfies  the  saddle  point  inequalities 

ri(g’  fo )  ^  7?(<?o .  fo )  =  7(/o )  £  n {go  *  /)•  where  §o  =  -fo/fo 
for  some  density  /0.  At  the  saddle  point,  efficacy  is  equal  to 
Fisher  information. 

III.  PARTIALLY  KNOWN  NOISE  STATISTICS 

Only  the  first  few  absolute  moments  of  the  noise  are  as¬ 
sumed  known.  The  admissible  set  of  absolutely  continuous 
densities  is  (F  =  {/|  J  |x|;  f(x)dx  =  Vy,  j  =  1, 2, ... ,  /},  where 
J  is  typically  2  or  3.  It  can  be  shown  that  there  exists  a  least 
favorable  density  /LF  e  (F  such  that  /( /LF )  =  inf  1(f)  for  all 
/  e(F .  Since  it  is  difficult  to  analytically  determine  /LF ,  a 
Gram-Charlier  series  approximation  [2]  is  used  to  model  the 
noise  densities  in  the  admissible  set.  Many  terms  are  used  to 
develop  good  Gram-Charlier  series  approximations  for  the 
/  e  *F .  Constrained  numerical  optimization  is  used  to  find 
the  series  coefficients  that  determine  the  least  favorable 
density.  The  robust  detector  is  a  nonlinear  correlator  with 
nonlinearity  £lf=-/lf//lf  that  is  LO  for  this  least 
favorable  noise  density. 


IV.  NUMERICAL  RESULTS 

Our  results  demonstrate  that  only  the  first  few  absolute  mo¬ 
ments  of  the  noise  are  needed  to  approach  the  performance 
of  the  AO  detector  derived  with  full  knowledge  of  the  noise 
density.  Apparently,  these  low-order  absolute  moments  are 
most  influential  in  determining  the  shape  of  the  density 
about  the  mode,  and  therefore  in  shaping  the  nonlinearity  at 
values  most  often  occupied  by  the  noise.  We  have  used  the 
first  two  and  three  absolute  moments  of  Gaussian-Gaussian 
mixture  (GGM)  and  Johnson  distributions  to  derive  the  LO 
nonlinearity  (NL)  gL¥  for  the  least  favorable  noise  density. 
The  efficacy  of  the  resulting  asymptotically  robust  detector 
7](gLF»/)  is  only  slightly  less  than  that  of  the  AO  detector 
and  is  significantly  greater  than  the  asymptotic  linear  and 
sign  correlators.  Fig.  1  shows  efficacy  results  for  detectors 
in  one  example:  the  density  /  =  /GGM  is  from  a  unit- vari¬ 
ance  Gaussian-Gaussian  mixture  class  with  a  contamination 
parameter  of  0.05,  and  the  first  two  absolute  moments  of 
/ggm  are  used  t0  obtain  /LF  and  hence  gLF.  Fifty  terms 
were  used  in  the  Gram-Charlier  series.  The  abscissa  in 
Fig.  1  is  the  ratio  of  the  contamination  variance  to  the  nomi¬ 
nal  variance  of  the  two  Gaussian  distributions  comprising 
the  Gaussian-Gaussian  mixture.  When  this  ratio  is  one,  the 
noise  is  Gaussian.  The  results  are  computed  for  correlators 
preceded  by  a  linearity,  sign  NL,  robust  NL,  and  GGM  LO 
NL  using  GGM  noise.  The  robust  NL’s  performance  is  also 
shown  for  the  least  favorable  noise  for  which  the  robust  NL 
is  LO;  while  this  noise  satisfies  the  moment  constraints,  it  is 
not  GGM.  Clearly,  the  performance  of  the  robust  detector 
approximates  that  of  the  AO  detector,  far  exceeding  that  of  a 
linear  correlator  or  a  sign  correlator. 


(Contamination  Variance)/(Nominal  Variance) 

Fig.  1.  Comparison  of  Detector  Efficacy  for  Example 
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Abstract -  In  this  paper  we  examine  some  new  methods  in 
conditional  testing  in  two-input  signal  detection  with 
condition  on  one  of  the  inputs.  In  this  method  ,the  number  of 
samples  has  been  considerably  reduced. 

I.  INTRODUCTION 

In  a  conditional  test  the  threshold  and  randomization 
probability  of  a  threshold  test  are  not  taken  to  be  fixed 
parameters  independent  of  the  data  but  are  directly  dependent  on 
the  specific  data  set  being  analyzed  for  a  test  of  hypotheses.  In 
our  paper,  conditional  testing  in  two-input  detectors  is  performed 
by  using  only  one  of  the  inputs.  Asymptotic  Relative  Efficiency 
(ARE)  has  been  computed  with  respect  to  Generalized  Cross 
Correlation  (GCC)  ,e.g.[l].  Also  this  method  is  performed  on  two- 
input  optimum  Three  Level  Coincidence  (TLC)  correlator  , 
e-g.[2]. 

II .  DETECTOR  STRUCTURE 

Consider  a  binary  problem  with  a  null  hypotheses  H  and  an 
alternative  hypotheses  K  and  let  X  n  =  (  Xx  ?  X2>  ? . . . .  ?  Xn  )  and 

Y  —  ( yl  ?y2  9 _ ,yn  )  denote  then-component  random 

observation  vectors.  A  fixed  threshold  test  for  H  against  K  is 
compared  with  test  function  T (  Xn  ?  Yn  ) .  Block  diagram  of  our 
detector  is  shown  in  Fig.  1 


h  are  pdf  of  noise  inputs.  The  threshold  pn  a  is  a  function  of 
or  Yn  .  When  Xn  is  passed  through  the  block  A (X  ,c), 
a  subvector  X'n  has  been  formed  from  X n  comes  to  detector  , 
where  YYl  <  YL  and  in  general  case  YYl  is  a  random  variable. 

Ill .  ASYMPTOTIC  PERFORMANCE 

If  the  functions  f  and  h  are  even  functions  and 
components  of  X  and  7  a re  iidand  /  Yl)  =  Then 

the  efficacy  of  two  input  conditional  testing  with  single  input 
conditioning  according  Fig.  1  is 


^cond 


+004-00 

k  J  \q{x,y)[h"{x)f{y)  +  2hf(x)f(y)+h(x)f"{y)+)lxdy 
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2 


4  J  \q2{x,y)h{x)f(y)dxdy 


(1) 


IV,  Special  Cases 

Let  A  ( X ,  c)  is  a  function  as  in  Fig.2 


Then  Fig  .  3  shows  the  ARE  C071f  qcc  GCC  7/2 
Gaussian  noise . 
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Fig.  3 

The  ARE  of  conditional  TLC  detector  with  respect  to  TLC  ,e.g. 
[2],  in  Gaussian  noise  has  been  shown  in  Fig.  4  . _ 
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V.  Conclusion 


For  single  input  conditional  testing  in  two  inputs  detector 
when  the  noise  is  Gaussian  or  nearly  Gaussian  we  have  an 
appropriate  method  for  detection  .  Also  the  number  of 
observation  for  process  is  considerably  reduced  .  The  percentage 
of  this  reduction  depends  on  the  number  of  input  samples,/?  as 
shown  in  Fig.3  and  Fig.4  .  However  ,  the  percentage  of  time 
processing  reduction  is  much  higher  than  that  for  the  samples.  The 
detail  of  the  method  is  presented  in  [3]. 
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Abstract —  We  consider  a  signal  detection  prob¬ 
lem  in  a  continuous- time  white  Gaussian  channel. 
The  signal  is  assumed  to  be  a  stationary  Gaussian 
process.  We  prove  that  the  error  probability  in 
the  signal  detection  tends  to  zero  exponentially 
fast,  as  the  observation  time  goes  to  infinity. 


We  assume  that  the  signal  {X(tf)}  is  a  regular  sta¬ 
tionary  Gaussian  process  with  spectral  density  function 
(SDF)  /.  Note  that  {B(t)}  is  a  generalized  station¬ 
ary  process  with  SDF  /0(A)  =  1/(2tt)  and  that,  under 
Hi,  (T(^)}  is  a  stationary  process  with  SDF  /i(A)  = 
/(A)  4-  l/(27r).  To  state  the  result  we  define  a  SDF  f#  by 


Summary 

The  aim  is  to  study  the  exponential-type  asymptotic 
behavior  of  the  error  probabilities  of  the  signal  detection 
in  a  continuous- time  white  Gaussian  channel  (WGC). 
The  model  of  a  WGC  is  presented  by 

Y(t)=  f  X(s)ds  +  B(t),  t€[0,T], 

Jo 

or 

where  {!?(<)}  is  a  Gaussian  white  noise,  Ar(tf)  and  Y(t) 
are  a  channel  input  and  the  corresponding  output,  re¬ 
spectively.  The  signal  detection  problem  consists  of  de¬ 
ciding,  based  on  the  observation  of  the  output  (F(*)}, 
whether  the  signal  (X(tf)}  is  sent  or  not.  In  other  words, 
we  consider  testing  problem  of  two  hypotheses 

H0  :  Y(t)=B(t ),  t  e  [0,  T], 

Hx  :  Y(t)=  [  X(s)ds  +  B(t ),  t  6  [0,T]. 

Jo 

Two  probabilities  of  error  are  defined  by 

e0(T)  =  Pr({y(<)}  tS\H0  is  true), 
ex(T)  =  Pr({T(f)}  e  S\HX  is  true), 

with  a  decision  region  S  C  R^0’7^.  A  Ney man- Pearson 
test  is  a  test  given  by  a  decision  region  of  the  form 

<ST(«)  =  ilog|£(jO  <  «}, 

where  f. if  is  the  probability  distribution  of  (Y(^)}  under 
the  hypothesis  Hi.  It  is  well  known  that  Neyman- Pearson 
tests  are  optimal  to  minimize  ex (T),  where  u  is  chosen 
so  that  eo(T)  =  fiQ  (St(u)c)-  In  this  casse,  e1(Tr)  = 


1/A(A)  =  (1  —  0)//o(A)  + 

We  define  for  each  SDF’s  /  and  g,  by 


j_  r  (f~Yi 

^  7-00  VflP) 


1  -  log 


m 

5(A) 


dX. 


We  can  show  that  H(fe;  fo )  is  the  relative  entropy  (or 
information  divergence)  of  a  stationary  Gaussian  process 
with  SDF  }$  with  respect  to  the  white  noise  {B(t)}. 

Concerning  the  exponential-type  asymptotic  behavior 
of  the  error  probabilities,  we  can  prove  the  following  the¬ 
orem. 


Theorem  1  Assume  that  the  SDF  /  is  continuous. 
Then,  for  any  a  >  0,  there  exists  a  constant  ua  such 
that 

lim  T~l  log^.Q  (ST(ua))  =  -a. 

1  — ►  OO 

If  o  <  or  =  H(h;fo)  <  H{f i;/o)  (o  <  6  <  1),  then 
Tlim  r-Gog^f  (, ST(uaf )  =  -H(fe-,h). 

li  a  -  H(fe;f0)  >  (9  >  1),  then 

^J-Mogfi  -  (sT(uay)}  = 


The  proof  is  based  on  a  large  deviation  theorem. 

In  discrete-time  cases,  the  asymptotic  behavior  of  error 
probabilities  in  hypothesis  testing  has  been  studied  [l,  2]. 
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I.  Introduction 

We  address  the  problem  of  optimal  detection  of  a  random 
signal  transmitted  over  a  time- varying  frequency-selective  cor¬ 
related  Rayleigh  fading  channel.  We  present  a  general  re¬ 
cursive  solution  which  may  be  operated  at  full  complexity  to 
provide  optimal  detection  or  at  reduced  complexity,  using  Per- 
Survivor  Processing  (PSP)  techniques  [1],  to  yield  a  subopti- 
mal  receiver.  An  alternative  full- complexity  solution  based  on 
the  innovations  approach  may  be  found  in  [2]. 

II.  Proposed  Recursive  Receiver 

To  derive  the  optimal  receiver  structure,  we  adopt  a 
discrete-time  representation  of  the  received  signal  obtained 
by  Nyquist  sampling  (let  /3  be  the  resulting  oversampling 
factor).  We  denote  with  { r the  samples  of  the  re¬ 
ceived  signal  (sufficient  statistics),  {afe}^x  the  information 
sequence  and  Lisi  the  intersymbol  interference  length  in 
symbol  periods.  We  also  denote  with  Lc  the  channel  co¬ 
herence  time  expressed  in  symbol  periods  and  assume  it  fi¬ 
nite.  Let  L  =  Lisi  +  Lc  and  define  the  following  vectors: 
Tip  =  (n,r2,. . .  ,7*1/3),  =  (ai_Zoa2-L?'-->ao,tti,*--5ai)> 

ri(3  =  (r(t-Lc)/3+l>  7*(t-Lc)/3+2>  •  *  *  ai  =  (a*+l  ~L »  *  ■  *  J  a»)> 

ri{3  =  (r(i_l)/3  :r(t-l)/3+l>7’(t-l)/3+25*->7,i/3)?  =  (a-_i:a»), 

where  (•  !  •)  denotes  vector  concatenation. 

Optimal  detection  requires  to  perform  the  maximization 
a*  =  arg  maxa^  p(ri/3|a{)  where  p(ri/3|ai)  is  the  conditional 
Probability  Density  Function  (PDF)  of  np  given  a*.  Because 
of  the  assumed  channel  model,  this  PDF  is  multivariate  zero- 
mean  Gaussian.  Due  to  the  limited  channel  coherence  time 
Lc,  it  is  possible  to  factorize  this  PDF.  By  the  second  Bayes 
theorem,  each  factor  may  be  expressed  as  a  ratio  of  the  PDFs 
of  the  partial  observation  vectors  and  r^j,  which  do 

not  depend  on  the  complete  data  sequence  but  only  on  a'fc_! 
and  aj[,  respectively.  The  parameter  L ,  which  is  the  length  of 
the  vector  a'fc  _  j ,  plays  the  role  of  an  overall  channel  memory, 
as  pointed  out  also  in  [2]. 

Making  use  of  the  correlation  matrices  Rr^  (ai)  = 
E\rkf  r'k0  |  a'fc]  ,  Rr^K)  =  E{rkpH  rk0  |  a'*'],  we  can  express 
the  likelihood  function  (path  metric)  to  be  minimized  as: 


*  [  det  Rr<»  (a'fc_!  :  ak ) 

M**)  =  5Zlog  I  det  Rr<  (a'fc  J 

r(fc-i)/3v  k~1 


+  rfc/3H  Rp"  (afc-i  •  afe)  rfc/3  r(fe-i)/3  Rrj  (afc-i)  r(fc-i)/3 

where  det(-)  denotes  the  determinant  of  a  matrix  and  [*]H 
is  the  Hermitian  operator.  The  above  minimization  can  be 
performed  by  searching  the  optimum  path  in  a  trellis  diagram 
whose  state  is  defined  as  pk  ==  This  search  may  become 
prohibitive  for  highly  correlated  channels,  since  the  number  of 
trellis  states  might  be  very  high  (ML,  if  M  is  the  number  of 
constellation  symbols). 

An  alternative  suboptimal  solution  is  offered  by  well-known 
PSP  techniques.  We  define  a  reduced  state  fi,k  —  = 


EbIN0  (dB) 

(a*-jr+i,ak-ic+2,..-,ak)  where  K  (1  <  K  <  L)  is  an  in¬ 
teger  which  controls  the  degree  of  desired  complexity  reduc¬ 
tion  and  a  vector  a'fc  =  (ak+i-L, .  •  • ,  o>k-K  ♦  a*) ,  in  which 
{a*;  i  =  k  —  K,...,k  —  L}  denote  information  symbols  associ¬ 
ated  with  the  survivor  of  state  jXk-  The  resulting  path  metric 
is  formally  identical  to  (1)  after  substituting  o!k^i  with  a*_i. 

III.  Numerical  Results  and  Conclusions 

The  performance  of  the  proposed  receivers  is  assessed  in 
terms  of  Bit  Error  Rate  (BER)  versus  Eb/N0  (Eb  is  the  bit 
energy  averaged  over  channel  and  data  statistics).  The  over¬ 
all  channel  is  a  symbol-spaced  (f3  =  1)  finite  impulse  response 
filter  with  three  independent  taps,  modeled  as  first  order  au¬ 
toregressive  (the  forgetting  factor  is  0.998).  For  a  QPSK  mod¬ 
ulation  format,  blocked  transmission  with  blocks  of  60  symbols 
is  assumed.  A  preamble  and  tail  both  of  2  symbols  are  used. 

In  the  figure,  BER  of  the  proposed  detectors  is  compared 
to  lower  and  upper  bounds  derived  as  in  [3].  Complexity  sav¬ 
ings  (Mk  instead  of  ML  trellis  states)  may  be  achieved  with 
the  proposed  suboptimal  algorithms  based  on  PSP  at  the  ex¬ 
pense  of  a  moderate  performance  loss  (compare  the  perfor¬ 
mance  when  L  —  5  for  K  =  5,4).  Furthermore  for  an  equal 
number  of  trellis  states  ( K  =  5),  PSP  allows  to  improve  sig¬ 
nificantly  the  performance  by  increasing  the  assumed  channel 
memory  from  L  =  5  to  6  and  7.  In  three  cases  the  proposed 
receiver  performance  lies  between  the  lower  and  upper  bounds. 
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Abstract  -  New  robust  detection  algorithms  have  been 
developed  for  detection  of  pulse  signals  in  the  presence  of  a 
random  noise  and  random  pulse  interferences.  The 
algorithms  are  designed  in  assumption  that  a  priori 
information  on  compactness  of  the  useful  pulse  signals  is 
known.  It  is  shown  that  estimation  of  the  noninform ative 
signal  parameters  decreases  the  detection  quality.  The 
numerical  simulation  of  the  proposed  algorithm  is  carried 
out. 

I.  Introduction 

The  problem  of  detecting  pulse  signals  in  the  presence  of  a 
random  noise  and  random  pulse  interferences  is  of  great 
significance  for  time  synchronization  channels  of  TDMA 
systems,  for  radionavigation  (VOR/DME)  and  radars  [1,2,3]. 
The  presence  of  pulse  interferences  can  lead  to  appearance  of 
outliers  at  the  input  of  a  signal  detector.  The  outliers 
considerably  complicate  the  solution  of  the  problem  [4]. 

As  the  input  the  detector  is  assumed  to  use  time  delay  of  the 
received  signal  which  can  be  written  and  stored  in  a  detector 
memory.  Thus  in  the  time  domain  the  problem  of  robust  signal 
detection  can  be  formulated  in  the  following  way: 

Hq  :  Xj  =  S|  ,  /i  \ 

H1:Xi=yi(ei+Ui)  +  (l-yi  )%  ,  '  > 

where  xi  is  observation  (time  delay),  si  -  time  delay  of  the 
interference  impulse,  /j  -a  random  sequence  with  a  value  1 
when  Xj-  belongs  to  an  informative  (signal)  set  and  zero  when 
Xj  belongs  to  a  noninformative  (interference)  set,  v7-  -  time 
delay  estimation  error  due  to  a  random  noise.  Conditional 
probability  density  functions  (pdf)  fjfx/Op/^l)  and 
f2(x/Yi=0)  are  assumed  to  be  either  normal  or  exponential. 

A  dynamics  of  time  delay  of  the  received  signal  can  be 
described  by  a  difference  equation 

yk  -$yk-i +co*  >®k  =Hk  yk  ,  where  all  symbols  are 
commonly  used. 

II.  Robust  Detection  Algorithms 
For  solving  the  problem  it  is  necessary  to  calculate  the 
generalized  likelihood  ratio  (GLR) 

l(xn  /©,r )  =  f(xn/®,r,H\)/f(xn  / H0),  where 

@=[Gp...,@n\T  is  the  vector  of  informative  parameters  and 
is  the  vector  of  noninformative  parameters. 

The  detection  statistics  can  be  obtained  either  by  averaging  the 
GLR  by  all  possible  values  of  noninformative  parameters  or  by 
estimation  of  them  (the  case  of  classification  of  the  received 
signal).  In  both  cases  first  of  all  it  is  necessary  to  estimate  the 
informative  parameter  vector  that  is  to  estimate  time  delay  of 
the  received  signals.  The  problem  is  complicated  by  the 
presence  of  outliers  in  the  observations.  In  this  paper  we 
developed  a  fixed- interval  smoothing  algorithm  on  the  basis  of 
the  invariant  embedding  method  [5].  This  algorithm  showed  a 
high  accuracy  of  the  estimates  and  it  consists  of  two  nonlinear 
Kalman  filters  of  which  one  is  a  backward  filter.  A  matrix  gain 


of  the  filter  depends  on  a  posteriori  probabilities 

P(yi=j/xi)J= o,i. 

As  was  mentioned  above  one  way  of  developing  an  optimal 
detection  statistics  is  averaging  the  GLR  by  all  noninformative 
parameters.  It  is  easy  to  show  that  in  this  approach  the 
likelihood  ratio  logarithm  can  be  written  as: 

h(XJ  =  £ln\l-P(yi=l/xi,®)l  (2) 

i=i  L  J 

The  statistics  (2)  is  optimal  for  given  vector  ©  .  The  other  way 
is  to  estimate  the  noninformative  parameters  of  the  received 
signal.  Two  possible  situations  which  can  be  encountered  in 
practice  have  been  considered: 

1) .  If  the  number  of  the  signal  samples  q  is  known,  we  can 
classify  as  the  signals  those  of  them  which  has  maximum  value 

of  P(yi  =  1/  xitS  ) .  Then  it  follows  that 

h(Xn)  =  Z  =j,e)/f2(xj/yj  =  0)] ;  (3) 

j= i  L  J 

2) .  For  unknown  number  of  signal  samples  q  the  estimate  of 
the  vector  T  can  be  found  as  a  maximum  of  the  a  posteriori 
probability  P(T  /  x ,©  ) .  In  this  case  the  expression  (3)  can  be 
written  in  approximate  form 

h(Xn)=hi-  (4) 

1=1 

III.  Conclusion 

The  computer  modelling  of  proposed  algorithms  for  a 
radionavigational  system  was  carried  out  for  Gaussian  and 
Laplace  pdf  of  contaminated  observations.  The  best  results 
were  obtained  for  Laplace  pdf  because  of  the  great  contrast 
between  pdf  of  normal  measurements  and  outliers.  The 
algorithms  (2)  and  (3)  showed  a  higher  efficiency  in 
comparison  with  a  nonparametric  (median)  signal  detector 
which  is  usually  used  in  such  a  situation.  The  algorithm  (4) 
had  practically  the  same  characteristic  as  the  median  detector. 
It  should  be  noted  that  all  proposed  algorithms  are  sensitive  to 
a  priori  information  on  probability  p. 
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Abstract  —  Time- varying  mappings  are  used  in  place 
of  a  stationary  mapping  to  improve  the  performance 
of  Euclidean-space  codes  on  ISI  channels. 

I.  Introduction 

A  Euclidean-space  (ES)  code  that  utilizes  a  group  code  de¬ 
signed  over  the  group  G  is  mapped  to  a  signal  set  S  of  QAM 
modulation  waveforms  by  a  stationary  mapping  fi  :  G  —*■  S. 
The  underlying  group  code  is  described  by  an  encoder  of  finite- 
length  generator  sequences  [1].  The  QAM  system  used  on  a 
channel  with  ISI  can  be  equivalently  represented  as  a  discrete¬ 
time  (DT)  ISI  channel  with  AWGN  [2].  The  combination  of  an 
ES  code  with  a  DT  ISI  channel  can  be  combined  into  a  more 
complex  composite  ES  code.  A  Viterbi  decoder  is  the  nearly- 
optimal  ML  decoder.  Typically,  there  is  a  significant  reduction 
in  the  free  distance  for  the  resulting  composite  ES  code  com¬ 
pared  to  the  d2rce  for  the  memoryless  channel.  A  technique 
known  as  TH-precoding  has  been  used  to  regain  some  of  the 
loss  by  performing  the  inverse  of  the  DT  ISI  channel  in  the 
transmitter  along  with  a  modulo  power  constraint  [3].  This 
technique  requires  that  the  transmitter  has  exact  knowledge  of 
the  ISI  channel  through  a  feedback  channel  from  the  receiver. 

II.  Time- Varying  Mappings 

An  alternative  proposed  method  for  coding  on  a  kth- order  DT 
ISI  channel  is  to  use  an  ES  code  that  has  time-varying  map¬ 
pings  fii  :  G  — ►  Ri(S )  where  =  Ri  o  \l.  These  codes  will 
be  called  TVMES  codes.  For  a  specific  channel  and  station¬ 
ary  ES  code,  there  typically  exists  an  ordered  collection  of 
mappings  Rx  that  regains  some  of  the  loss  in  d2ree.  In  many 
cases,  the  performance  is  better  than  that  of  the  TH-precoding 
technique,  but  at  the  cost  of  an  exponentially  more  complex 
Viterbi  decoder  which  requires  synchronization.  The  trans¬ 
mitter  does  not  require  exact  knowledge  of  the  ISI  channel, 
so  a  more  robust  code  can  be  designed  over  a  range  of  possi¬ 
ble  channels.  Implicit  knowledge  of  the  range  is  necessary  to 
find  the  best  combination  of  group  code  and  mappings  for  the 
range. 

HI.  Restrictions  on  the  Time- Varying  Mappings 

Just  as  an  exhaustive  search  is  required  to  find  the  best  ES 
code  on  the  memoryless  channel,  the  TVMES  codes  require 
an  additional  search  over  all  ordered  collections  of  mappings 
for  each  ES  code.  To  make  this  search  managable,  restric¬ 
tions  are  necessary  for  the  type  of  mapping  Rx  that  is  permit¬ 
ted,  and  restrictions  are  necessary  for  the  form  of  the  ordered 
collection  of  mappings.  This  is  a  current  area  of  research. 
The  most  severe  restrictions  are  that  the  collection  be  of  the 
form  of  incremental  powers  of  a  single  unitary  transformation 
fii  =Z  Rl fi.  This  will  be  called  a  rotating  (or  reflecting)  ES 
code  (RESC).  RESC  codes  have  shift-invariant  distances  on 
the  kth- order  DT  ISI  channel,  but  more  importantly,  the  prob¬ 
lem  of  finding  the  best  unitary  transformation  can  be  set  up  as 


an  unconstrained  optimization  problem.  A  Newton- Raphson 
type  algorithm  can  be  used  to  solve  for  the  pseudo-globally 
best  unitary  transformation  for  a  given  DT  ISI  channel  and 
a  given  ES  code.  This  severe  restriction  on  the  time-varying 
mappings  actually  includes  many  other  types  of  collections 
because  TVMES  codes  do  not  have  a  unique  representation. 
Many  good  codes  have  been  seen  for  small  order  DT  ISI  chan¬ 
nels.  Several  specialized  techniques  for  designing  a  code  for 
the  ISI  channel  can  be  generalized  as  TVMES  codes. 
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Abstract  —  An  algebraic  characterization  is  given 
for  the  groups  that  can  appear  as  the  branch  group 
(«  trellis  section)  of  some  convolutional  code  over  a 
group  (ring,  field). 


A  convolutional  code  over  a  group  G  is  basically  a  shift- 
invariant  subgroup  of  Gz  (subject,  perhaps,  to  some  further 
conditions  such  as  controllability  and  observability,  or  com¬ 
pleteness)  [1]  [2].  The  trellis  of  such  a  code  consists  of  identi¬ 
cal  sections,  each  of  which  is  a  triple  (G,  5,  B),  where  5  is  the 
state  space  (or  state  group)  and  the  branches  B  are  a  subgroup 
of  S  x  G  x  S. 

The  standard  “algorithm”  to  construct  convolutional  codes 
over  a  field  goes  as  follows: 

i.  Choose  a  configuration  of  shift  registers  (cf.  Fig.  1). 

ii.  Choose  a  linear  mapping  from  the  shift  registers  into 

F2n. 

Note  that  the  configuration  of  shift  registers  corresponds  to 
the  projection  of  the  above  group  B  onto  5x5. 

The  attempt  to  generalize  this  “algorithm”  to  codes  over 
general  groups  leads  to  the  problem  of  characterizing  those 
groups  that  can  appear  as  (the  projection  onto  5  x  5  of) 
B.  Our  main  result  is  the  following  characterization  of  such 
groups. 

Definition:  A  shift  structure  (Hq,  Hi, . . .  Hi\  ip)  for  a  group 
(module,  vector  space)  H  consists  of  a  collection  Ho,  Hi, 

•  •  -  Ht  of  normal  subgroups  (submodules,  subspaces)  of  H 
(that  need  not  be  disjoint)  together  with  an  isomorphism  <p 
from  H/Hi  onto  H/Ho  such  that 

i.  Ho  *  Hi  *  ...  *  Hi  =  H; 

ii.  (Ho  *  Hi  *  ...  *  Hj)  C\  ( Hj  *  Hj+i  *  ...  *  Hi)  =  Hj  for 

0  <  j  <  £; 

iii.  ip(Hj  *  Hi)  =  Hj+i  *  Ho  for  0‘  <  j  <  L 

Main  Theorem:  Every  strongly  controllable ,  shift-invariant 
group  code  over  any  group  ( module ,  vector  space)  G  can  be 
found  by  the  following  “algorithm”: 

i.  Choose  a  group  H  with  a  shift  structure 
(H0,  Hu...  Hi;  ip). 

ii .  Choose  a  homomorphism  oj  :  H  — ►  G. 

iii.  Construct  the  trellis  ( G,S,B )  with  states  S  =  H/Ho 
and  branches 

B  =  {(h*  H0,ui{h),  <p(h  *  He))  ■.  he  H}. 

For  Euclidean-space  codes,  G  need  not  be  specified  a  priori. 
In  this  case,  step  (ii)  may  be  replaced  by 

ii.  Choose  a  homomorphism  u ;  from  H  into  the  isometry 
group  of  7Zn  . 


U2 

Ui 

U0 


uo,o 


Figure  1:  shift  register. 


The  simplest  example  of  a  group  (module,  vector  space) 
with  a  shift  structure  is  a  direct  product 

U  =  Uo  x  ul  x  Up1,  (1) 

where  Uo,  Ui,  . .  .Ui  are  groups  (modules,  vector  spaces)  and 
where  the  terms  are  themselves  direct  products.  Such  a 

group  may  be  represented  as  a  collection  of  delay  lines  as  in 
Fig.  1,  where  the  mapping  <p  may  be  interpreted  as  the  shift 
operator.  Note  that  the  corresponding  class  of  group  codes 
includes  all  (strongly  controllable)  convolutional  codes  over 
any  field. 

If  H  is  an  arbitrary  group  with  a  shift  structure,  it  can  be 
shown  that  a  (set-theoretic)  one-to-one  correspondence  exists 
between  H  and  a  group  of  the  type  (1)  such  that  (p  corresponds 
to  the  shift  operator  in  Fig.  1.  In  general,  however,  this  one-to- 
one  correspondence  is  not  an  algebraic  isomorphism;  in  other 
words,  the  shift  register  of  Fig.  1  is  equipped  with  an  algebraic 
structure  other  than  the  “natural”  direct  product. 

So  far,  we  have  found  just  one  class  of  groups  with  a  non¬ 
standard  (i.e.,  not  the  direct-product)  shift  structure:  (multi¬ 
plicative)  groups  of  matrices  with  ones  in  the  main  diagonal 
and  zeros  above  the  main  diagonal.  By  the  main  theorem, 
these  groups  give  rise  to  a  whole  new  class  of  noncommuta- 
tive  group  codes.  (Some  such  codes  seem  closely  related  to 
certain  codes  from  [3].) 
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Given  an  encoding  matrix  over  some  field,  various  criteria  are 
known  to  check  minimality  (cf.  [1],  [2],  [3]).  Most  of  these 
criteria  apply  to  encoders  of  a  particular  class,  e.g.,  basic  en¬ 
coders  or  systematic  encoders,  and  only  a  few  criteria  are  gen¬ 
eral  in  the  sense  that  they  apply  to  arbitrary  rational  encoding 
matrices.  In  this  paper,  causal  rational  encoders  over  commu¬ 
tative  rings  are  considered  and  a  general  criterion  of  Johan- 
nesson  and  Wan  [2]  is  generalized  to  rings,  which  satisfy  the 
descending  chain  condition.  Moreover,  a  new  simple  test  is 
presented  that  reduces  the  minimality  question  from  the  ring 
to  the  field  case.  The  basis  for  these  new  minimality  tests  are 
the  concept  of  minimality  of  group  systems  and  convolutional 
codes  as  presented  in  [4]  and  [5]. 

Let  R  be  a  commutative  ring  and  let  R[D]  denote  the  ring 
of  polynomials  over  R.  The  ring  of  rational  functions  over  R 
is  defined  by 

R(D)  =  |  f{D),s(D)eR[D],s(0)  =  1  and  m  €  zj  . 

A  kxn- matrix  G(D)  over  R(D)  is  called  a  rational  (n,k)  en¬ 
coding  matrix  over  R  if  it  has  k  linearly  independent  rows 
over  R(D ),  or  equivalently,  if  its  kernel  is  zero.  The  matrix 
G(D)  is  called  causal  (or  realizable),  if  all  its  components  are 
causal  rational  functions,  i.e.,  they  have  an  expansion  as  for¬ 
mal  power  series  in  D.  Every  rational  (n,  k)  encoding  matrix 
G(D)  gives  rise  to  an  (n,  k)  convolutional  code  over  R ,  which 
is  defined  by  C  =  {u (D)G(D)  :  u (D)  £  R(D)k}. 

To  every  convolutional  code  C,  one  can  associate  a  canoni¬ 
cal  state  space  Sc  that  depends  only  on  the  code  and  not  on  a 
particular  encoding  matrix  for  C  (cf.[4],  [5]).  A  causal  encod¬ 
ing  matrix  G(D)  is  said  to  be  minimal ,  if  the  abstract  state 
space  of  G(D)  is  isomorphic  to  the  canonical  state  space  Sc 
of  the  code  C,  which  is  generated  by  G(D).  In  case  of  a  finite 
alphabet,  this  definition  is  equivalent  to  the  usual  notion  of 
minimality,  which  states  that  the  encoder  G(D)  requires  the 
least  number  of  states  among  all  encoders  that  generate  the 
code  C. 

Johannesson  and  Wan  have  presented  the  following  general 
minimality  criterion  for  the  field  case  [2].  A  causal  encoding 
matrix  G(D)  is  minimal  if  and  only  if  G(D)  has  a  polynomial 
right  inverse  in  D  and  a  polynomial  right  inverse  in  D .  This 
criterion  cannot  be  generalized  to  arbitrary  commutative  rings 
because  one  can  show  that  it  does  not  hold  over  the  ring  of 
integers.  However,  there  is  a  suitable  class  of  rings  to  which 
the  criterion  can  be  extended,  namely,  the  class  of  commu¬ 
tative  rings  satisfying  the  descending  chain  condition  (DCC). 
The  DCC  is  a  rather  weak  restriction  for  practical  purposes 
because  most  encoding  alphabets  are  finite  and  every  finite 
ring  satisfies  the  DCC.  There  exists  an  important  structure 
theorem  for  commutative  rings  satisfying  the  DCC,  which  can 
be  viewed  as  an  extension  of  the  Chinese  Remainder  Theorem 
(cf.  Chap.  7.10  of  [6]).  Such  a  ring  decomposes  into 

R  =  Ri  ®  R2  0  . .  •  ®  Rs,  (1) 

where  the  RTs  are  local  rings  satisfying  the  DCC.  In  partic¬ 
ular,  it  follows  from  (1)  that  R  has  only  a  finite  number  of 


maximal  ideals  h ,  h , . . . ,  Is . 

Theorem  1  Let  R  be  a  commutative  ring  satisfying  the  DCC 
and  let  the  maximal  ideals  be  denoted  by  Ii,  I2, . . . ,  Let 
G(D)  €  R(D)kxn  be  a  causal  encoding  matrix.  Then  the  fol¬ 
lowing  statements  are  equivalent: 

(i)  G(D)  is  minimal; 

(ii)  G(D)  has  a  polynomial  right  inverse  in  D  and  a  poly¬ 
nomial  right  inverse  in  D~J ; 

(iii)  for  all  i  =  1, . . . ,  s,  the  reduction  of  G(D)  modulo  Ii  is 
minimal  over  the  field  R/Ii. 

Condition  (ii)  of  this  theorem  extends  the  Johannesson/ Wan 
criterion  to  the  ring  case.  Condition  (iii)  gives  a  new  minimal¬ 
ity  test  that  reduces  the  question  of  minimality  from  the  ring 
to  the  field  case.  It  is  illustrated  by  the  following  example. 

Example  1  Consider  the  following  encoding  matrix  over  the 
ring  of  intergers  modulo  4,  24,  given  by 

g(D)  =  TTTd  '  f  1  +  D  1  +  2D  +  3D2}- 

Reducing  G(D)  modulo  the  only  maximal  ideal  (2)  C  Z4,  one 
obtains  the  binary  encoding  matrix 

G(D)  =  [  1  1  +  D  ]  , 

which  is  minimal  over  the  binary  field  24/(2).  Hence,  condi¬ 
tion  (iii)  of  the  theorem  holds  and,  therefore,  G(D)  is  minimal. 

References 

[1]  G.  D.  Forney,  Jr,,  ‘Convolutional  codes  I:  algebraic  structure’, 
IEEE  Trans.  Inform.  Theory ,  vol.  16,  pp.  720-738,  Nov.  1970. 

[2]  R.  Johannesson  and  Z.  Wan,  ‘A  linear  algebra  approach  to 
minimal  convolutional  encoders’,  IEEE  Trans.  Inform.  Theory, 
vol.  39,  pp.  1219-1233,  July  1993. 

[3]  G.  D.  Forney,  Jr.,  ‘Algebraic  structure  of  convolutional  codes, 
and  algebraic  system  theory,’  in  Mathematical  System  Theory , 

A.  C.  Antoulas,  Ed.,  pp.  527  -558.  Springer  1991. 

[4]  H.-A.  Loeliger,  G.  D.  Forney,  Jr.,  T.  Mittelholzer,  and  M.  D. 
Trott,  ‘Minimality  find  observability  of  group  systems’,  Linear 
Algebra  &  Appl .,  vol.  205—206,  pp.  937-963,  July  1994. 

[5]  T.  Mittelholzer,  “Minimal  encoders  for  convolutional  codes 
over  rings,”  in  Communications  Theory  and  Applications,  Eds. 

B.  Honary,  M.  Darnell  and  P.G.  Farrell,  pp.  30  -  36,  HW  Comm. 
Ldt.,  1993. 

[6]  N.  Jacobson,  Basic  Algebra  II,  Freeman  &  Co.,  San  Francisco 
1974. 


305 


Permutation  decoding  of  group  codes 

Ezio  Biglieri1 

Dipartimento  di  Elettronica  •  Politecnico  •  Corso  Duca  degli  Abruzzi  24  •  1-10129  Torino  (Italy) 
fax:  +39  11  5644099  •  e-mail:  biglieriOpolito.it  • 


Abstract  —  We  consider  suboptimum  decoding  of 
group  codes,  represented  in  the  form  of  a  set  of  n~ 
vectors  whose  components  are  obtained  by  permut¬ 
ing  the  components  of  an  initial  vector  according  to 
a  certain  group  Q  of  permutations.  Permutation  de¬ 
coding  consists  of  the  following  two  steps.  First,  we 
decode  the  received  vector  by  searching  for  the  most 
likely  permutation  in  the  symmetric  group  Sn,  next 
we  select  the  element  in  Q  closest  to  the  permutation 
found.  Here  we  focus  on  the  first  step.  In  particular, 
we  show  how  any  group  code  can  be  represented  as 
a  permutation  code,  and  we  determine  the  minimum 
value  of  7i. 

I.  Introduction 

Consider,  for  motivation’s  sake,  decoding  of  a  binary  (n,  k) 
block  code  transmitted  over  the  additive  white  Gaussian 
noise  channel  with  the  standard  mapping  m  :GF(2)-+  R  de¬ 
fined  by  0  — ►  —  1, 1  — r  +1.  Soft  decoding  is  performed  by  pick¬ 
ing  the  code  word  closest  to  the  received  vector  r,  while  hard 
decoding  can  be  viewed  as  an  approximation  of  maximum- 
likelihood  decoding  performed  in  two  steps.  First,  one  uses 
preliminary  decision  regions  formed  by  the  orthants  of  Rn, 
thus  obtaining  an  element  y  £  m_1{±l}n.  Next,  algebraic 
decoding  transforms  the  resulting  n -tuple  y  into  a  code  word. 
The  whole  procedure  may  be  viewed  as  an  approximation  of 
the  Voronoi  regions  of  the  code  by  a  union  of  orthants  of  Rn. 

This  procedure  works  because,  while  the  determination 
of  the  one  among  the  Voronoi  regions  in  which  r  is  falling 
is  a  complex  task,  we  can  make  it  easier  by  approximating 
them  by  a  union  of  regions  such  that  it  is  easy  to  determine 
in  which  one  the  received  vector  is  falling.  Here  we  apply 
this  idea  to  group  codes:  their  Voronoi  regions  are  approxi¬ 
mated  by  a  union  of  smaller  regions  with  the  property  that 
determining  the  position  of  r  with  respect  to  them  is  an  easy 


II.  Group  codes 

Group  codes  are  generated  as  follows.  Consider  a  group 
G  of  N  x  N  orthogonal  matrices  which  forms  a  faithful  rep¬ 
resentation  of  an  abstract  group  Q  with  M  elements,  and  an 
“initial  vector”  x  £  RjV,  R^  the  Euclidean  A -dimensional 
space.  A  group  code  X  is  the  orbit  of  x  under  £,  i.e.,  the 
set  of  vectors  Gx.  By  assuming  that  the  only  solution  of  the 
equation  Gx  =  x,  G  £  G,  is  G  =  /  (the  identity  matrix),  the 
code  X  has  M  elements.  We  may  thus  denote  xg  the  code 
vector  associated  with  g  £  Q. 

With  the  vectors  of  X  transmitted  over  the  additive 
white  Gaussian  noise  channel,  the  optimum  (i.e.,  maximum- 
likelihood)  decoder,  upon  reception  of  the  noisy  vector 

1This  research  was  sponsored  by  the  Italian  National  Research 
Council  (CNR)  under  “Progetto  Finalizzato  Trasporti.” 


r  =  xg  +  n,  chooses  as  the  most  likely  transmitted  vector  the 
one  that  yields 

min||r-x9||2.  (1) 

If  g  is  not  endowed  with  any  special  structure,  decoding  (i.e., 
the  solution  of  (1))  is  obtained  by  exhaustive  search  among 
all  the  candidate  g  £  Q.  This  requires  a  number  of  calcula¬ 
tions  vc  =  NM  (in  fact,  M  scalar  products  of  N  terms  each 
must  be  computed)  and  a  storage  of  vs  =  NM  real  numbers 
(M  vectors  of  N  components  each).  In  addition  to  this,  the 
minimum  has  to  be  found,  which  requires  vm  operations. 

III.  Permutation  decoding 
We  call  Permutation  Signal  Set  (PSS)  a  set  of  vectors  that 
are  obtained  by  applying  a  group  Q  of  permutations  x  to  an 
initial  vector  x.  If  the  vectors  have  n  components,  applica¬ 
tion  of  the  symmetric  group  Sn  of  all  the  permutations  of  n 
letters  to  an  initial  rc-vector  gives  a  class  of  codes  known  as 
“permutation  modulation”. 

The  latter  codes  admit  an  especially  simple  maximum - 
likelihood  (ML)  decoding  algorithm.  Assume  that  vector  r 
was  received.  The  ML  decoder  must  seek  the  vector  xx 
which  maximizes  the  scalar  product 

n 

re  (7 rxV 
*= 1 

This  maximum  is  achieved  when  the  largest  component  of 
7T x  is  paired  with  the  largest  component  of  r,  the  second 
largest  component  of  xx  is  paired  with  the  second  largest 
component  of  r,  etc.  This  algorithm  is  algebraic  in  nature, 
and  does  not  require  the  receiver  to  store  all  the  code  words. 

Now,  if  the  PSS  is  generated  by  a  subgroup  Q  of  Sni  we 
may  use  the  same  basic  decoding  idea  in  two  steps: 

1.  We  first  decode  r  as  if  Q  —  Sn,  obtaining  as  a  result  a 
permutation  x  of  n  letters.  This  may  not  belong  to  Q . 

2.  Next  we  “algebraically  decode”  x  into  an  element  of  Q . 

Here  we  focus  on  the  first  decoding  step.  In  particular,  it 
can  be  proved  that 

•  Every  group  code  can  be  represented  in  the  form  of 
a  permutation  signal  set  acting  on  an  initial  vector  x 
with  n  components. 

•  The  minimum  value  of  n  is  obtained  as  follows.  If 
|H'|  denotes  the  largest  non-normal  subgroup  of  Q  that 
does  not  include  normal  subgroups  of  Q  other  than  the 
identity,  then  n  is  given  by  the  ratio 
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Abstract  —  Algebraic  fundamentals  of  convolutional 
encoders  are  given  by  using  the  Schreier  product  and 
the  Theory  of  Machines. 

I.  Introduction 

The  majority  of  the  convolutional  encoders  known  in  the  tech¬ 
nical  literature  are  over  algebraic  fields.  Recently,  [1],  and  [2], 
have  shown  that  these  encoders  essentially  make  use  of  the 
additive  group  of  those  fields.  We  take  this  approach  and  de¬ 
fine  the  elementary  convolutional  encoder  (ECE)  over  abelian 
groups  and  we  point  out  their  main  properties  that  will  serve 
as  a  reference  to  the  definition  of  general  machines.  By  use 
of  the  Schreier  product,  the  general  convolutional  encoder 
(GCE)  is  defined.  As  a  consequence,  the  ECE  is  a  partic¬ 
ular  case  of  the  GCE.  The  Schreier  product  can  be  properly 
exploited  in  the  design  of  the  encoder.  As  an  example  of  this 
fact,  we  provide  two  results  about  the  machine  only  by  looking 
at  the  properties  of  this  product. 

II.  Machines 

Definition  1  A  machine  is  a  quintuple  M  =  (A,  Y,  Q,  8,  /?); 
where  X  is  a  finite  set  of  inputs;  Y  is  a  finite  set  of  outputs;  Q 
is  a  (not  necessarily  finite)  set(space)  of  states;  8  :  X  x  Q  — *  Q 
is  the  next-state  application;  /?  :  X  x  Q  — *  Y  is  the  output 
application.  O 

Let  be  a  finite  string  of  elements  of  X.  We  say  that  the 
machine  M  —  (A,  Y,  Q,  8,  /?)  is  controllable  if  for  all  q  and  q 
€  Q\  there  is  a  string  x*  such  that  q1  —  8*(x*,q).  Where  8* 
is  the  natural  recursive  extension  map  of  8.  Given  j  €  IV;  if 
Vg,  q  6  Q3x*  €  A*  with  |x*|  <  j  such  that  q  =  8*(x*,q); 
then  we  will  say  that  the  machine  is  J-cont  Tollable.  Clearly,  if 
M  is  J-controllable  then  it  is  (j  +  l)-controllable.  The  number 
v  —  min  {j  |  M  is  j  -  controllable }  is  the  control  index  of 

M. 

III.  Elementary  Convolutional  Encoder 
Definition  2  Let  n,  k,  and  m  be  natural  numbers  such  that 
n  >  k  >  l,  and  m  >  1.  Consider  the  matrices  T°, T1 ,  ...,Tm, 
with  Tl  =  (frs)  ,  where  frs  €  Z,  1  <  r  <  k,  1  <  5  <  n,  and 
i  -  0,  We  define  an  elementary  convolutional 

encoder  with  parameters  ( n ,  k,  m)  over  a  finite  abelian  group 
G  as  a  machine  M  ==  (A,  Y,  Q,  8,  ft)  where : 

AC  Gk  is  the  finite  set  of  the  input  alphabet; 

Y  C  Gn  is  the  set  of  the  output  alphabet; 

Q  =  [q  =  (; r\x2,...,xm )  |  x*  €  X}  C  ( Gk)m  «  Gkm,  is  the 
set  (or  space)  of  the  machine  states; 

8  :  X  X  Q  ->  Q,  is  given  by  8(x°,q)  =  (x°,  x1,  x2, ...,  x™"1) 
(the  next  state  map); 

ft  :  X  x  Q  -+  Y,  is  given  by  fi(x°,q)  =  /?(x°,  x1, ..., xm)  = 
x°T°  +  zPT1  +  ...  +  xmTm;  (machine’s  outputs ).  o 

JThis  work  was  supported  in  part  by  FAPESP  under  grant 
92/4845-7,  and  it  has  been  supported  by  CNPq  under  grant 
301416/85-0,  Brazil. 


From  this  definition  we  can  show  the  following  properties  of 
the  ECE: 

Proposition  1  If  X  is  a  group,  then :  i)  Q  and  /3(X,Q ) 
are  also  groups,  ii)  The  Cartesian  product  X  x  Q  becomes 
a  direct  product  of  groups  and  the  mappings  8  and  are 
group  homomorphisms,  with  8  being  surjective,  iii)  The  sets 
Y0  =  {P{x,eQ)}x<EX  and  Y!  =  {P(x,  q)  |  6(x,  q)  =  eg}  are 
normal  subgroups  of  and  ~  ~  Q ■  iv) 

The  ECE  is  a  controllable  machine,  with  control  index  v  <  m. 

IV.  General  Convolutional  Encoder 
Definition  3  Let  X  and  Q  be  two  finite  groups.  Let  a  :  Q  — ► 
Aut(X)  and  p  :  Q  x  Q  — ►  A  be  mappings  such  that  for  any 
gi,?2,g3  G  Q,  and  x  G  A  both  satisfying  the  following  con¬ 
ditions:  l)  <r(qi)(p{q2,  qz))-y(qu  9273)  =  p{quq2)-y(giq2,q3) 
and  2)  o-(gi)(<T(g2)(x))  =  p{qi ,  q2)><r(qiq2)(x).p(qu  - 

Then,  we  define  the  Schreier  product  X<j^Q,  of  X  and 
Q  as  the  ordered  pair  of  the  elements  of  the  respective  groups 
( h,k )  satisfying  the  following  operation: 

(x,q)*  (x\  q  )  =  (x.cr(q){x').p(q,  q),  qq)  .O 

This  Schreier  product  is  a  group  with  identity  element 
(^(eQjeg)”1  ,  eg),  where  eg  is  the  identity  element  of  Q. 

Definition  4  A  general  convolutional  encoder ,  with  pa¬ 
rameter  v,  is  a  v- controllable  Schreier  machine  Ma,n  — 
(A,  Y,  Q,  8,  fi)  such  that  the  application #  :  X«^Q  -+  QxYxQ 
given  by  T(x,  q)  =  ( q ,  P(x,q),  8{x,  q))  is  injective.  O 

Assuming  the  set  A  is  a  group,  and  since  the  direct  product 
is  a  particular  case  of  the  Schreier  product,  we  have  that  the 
ECE  is  a  particular  case  of  GCE.  Let  T  =  Imffy)  be  the 
edges  of  the  trellis  of  T  is  a  group  isomorphic  to 

X<j^Q.  Moreover  the  sets  To  =  {^(x,  eg)}x€X  and  Tl  = 
{^(x,g)  |  8(x,  q)  =  eg}  are  normal  subgroups  of  T  and  « 
X-  ~  Q.  On  the  other  hand,  To  ~  A.  Hence,  if  To  =  Tl;  then, 
given  q  ^  eg,  there  is  no  x*  such  that  <5*(x*,eg)  =  q.  Thus, 
we  have: 

Theorem  1  If  the  class  x  =  {i7  C  XUAiQ  /  H  is  a  normal 
subgroup  with  \H\  =  |A|},  has  no  more  than  one  element,  then 
the  machine  is  non- controllable. 
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Abstract  —  We  show  how  to  construct  and  classify 
inequivalent  homogeneous  rate-fc/fc  +  1  trellis  codes  us¬ 
ing  principles  of  computational  group  theory.  Given 
a  complete  classification  of  useful  trellis  structures, 
trellis  codes  based  on  groups  are  no  more  difficult  to 
construct  than  trellis  codes  based  on  binary  fields. 

I.  Motivation 

A  homogeneous  code  6  [1,  2]  is  the  orbit  Cx  of  a  group 
code  C  [3,  4]  acting  on  a  constant  sequence  x.  The  class 
of  homogeneous  codes  is  larger  than  the  class  of  binary  linear 
convolutional  codes,  and  may  therefore  be  expected  to  contain 
new  useful  trellis  codes.  While  linear  codes,  which  are  always 
homogeneous,  can  be  found  by  enumerating  parity  check  equa¬ 
tions,  codes  constructed  from  non- abelian  groups  cannot.  We 
are  therefore  forced  to  use  more  complex  methods  from  group 
theory. 

II.  Methods 

We  can  separate  the  problem  of  finding  homogeneous  codes 
into  three  parts:  choosing  a  group  structure  for  the  trellis,  as¬ 
signing  labels  to  trellis  branches,  and  testing  for  pathological 
behavior.  Like  the  enumeration  methods  used  for  convolu¬ 
tional  codes,  the  partitioning  and  labeling  of  the  signal  set  is 
essentially  independent  of  the  code  search. 

A  suitable  definition  of  equivalence  for  homogeneous  codes 
greatly  reduces  the  number  of  distinct  structures  that  must 
be  examined  at  each  step.  Two  homogeneous  codes  are  equiv¬ 
alent  if  there  is  a  bi-infinite  sequence  of  label  permutations 
that  maps  one  to  the  other.  It  can  be  shown  that  equivalent 
codes  are  always  related  by  a  constant  sequence  of  permu¬ 
tations.  Thus,  code  equivalence  is  simply  trellis  equivalence, 
where  two  labeled  trellises  are  equivalent  if  there  is  a  permu¬ 
tation  of  states  and  labels  that  takes  one  trellis  to  the  other. 

Group  trellis  structures  are  enumerated  using  derivative 
codes  and  group  extensions.  The  size,  rate,  and  controllability 
properties  of  the  trellis  are  selected  in  advance;  this  fixes  the 
locations  of  the  trellis  branches.  A  given  trellis  admits  only 
one  binary  linear  algebraic  structure.  But  it  may  have  several 
different  group  structures.  Fortunately,  despite  the  enormous 
number  of  nonisomorphic  groups  of  even  small  order,  only  a 
handful  appear  as  the  algebraic  structure  of  a  trellis. 

These  groups  are  found  by  enumerating  group  extensions. 

If  C  is  a  group  code,  its  derivative  code  Cf  is  formed  by  taking 
the  set  of  state  sequences  traversed  by  the  sequences  of  C. 
Iterated  derivatives  terminate  at  the  trivial  code.  If  C  has 
no  parallel  transitions  then  C  and  C'  are  isomorphic.  Hence 
(unlabeled)  trellises  can  be  enumerated  up  to  equivalence  by 
enumerating  derivatives  up  to  isomorphism. 

The  derivative  operation  strips  away  any  parallel  transi¬ 
tions  in  the  trellis  of  C.  Reversing  the  derivative  in  such  cases 
requires  a  group  extension  of  the  trellis  by  its  parallel  branch 
group.  Group  extensions  of  2-ary  groups  by  2-ary  groups, 
which  are  the  only  type  that  arise  for  rat e-k/k  +  1  codes,  are 
straightforward  to  enumerate  for  moderately  sized  groups. 

1This  work  was  supported  by  NSF  Grant  NCR-9457509 


Given  an  unlabeled  group  trellis,  the  next  step  is  to  assign 
labels  to  branches.  It  suffices  to  assign  only  the  zero-labeled 
branches  in  the  trellis  because,  for  a  homogeneous  code,  the 
zero-labeled  branches  form  a  subgroup  of  the  trellis,  and  each 
right  coset  of  this  subgroup  is  distinctly  labeled. 

Zero  labeling  proceeds  as  follows.  The  states  which  have 
exiting  zero  branches  are  a  subgroup  of  the  state  group,  as 
are  the  states  with  entering  zero  branches.  In  fact,  these  two 
subgroups  must  be  isomorphic  and,  for  mte-k/k  +  1  codes, 
must  be  half  the  size  of  the  state  group.  Choosing  the  left 
and  right  zero-labeled  state  groups  therefore  amounts  to  enu¬ 
merating  a  restricted  class  of  subgroups  of  index  2.  The  zero- 
labeled  branches  define  an  isomorphism  between  the  left  and 
right  zero-labeled  state  groups;  assigning  zero  branches  is  tan¬ 
tamount  to  enumerating  isomorphisms  from  one  subgroup  to 
another.  Recent  advances  in  computational  group  theory  have 
solved  this  problem  for  2-ary  groups. 

The  last  step  in  the  construction  of  useful  trellis  group 
structures  is  to  test  the  trellis  for  catastrophic  behavior.  For 
group  codes,  this  test  is  performed  by  checking  if  the  zero- 
labeled  branch  group  admits  a  periodic  path  through  the  trel¬ 
lis.  Interestingly,  this  final  test  eliminates  many  nonabelian 
state  groups  for  which  no  noncatastrophic  labeling  exists. 

The  final  step  of  mapping  branch  labels  to  elements  of  a 
partitioned  signal  set  can  proceed  as  with  the  standard  binary 
linear  case. 

III.  Results 

The  methodology  developed  above  reduces  the  problem  of 
enumerating  useful  groups  to  an  essentially  mechanical  pro¬ 
cess.  Preliminary  results  for  small  codes  are  tabulated  below. 
The  results  for  binary  linear  codes  were  computed  primarily 
for  verification;  they  can  also  be  found  by  counting  parity 
check  equations.  Note  that  nonabelian  codes  become  more 
plentiful  beyond  16  states. 


states  rate 

state  group 

number 

4  1/2 

Z2  x  Z2 

4 

00 

Z2  x  Z2  x  Z2 

16 

2/3 

Z2  x  Z2  x  Z2 

12 

2/3 

Ds 

1 

16  1/2 

64 

2/3 

(Z2)4 

48 
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We  consider  codes  of  the  following  type.  Let  S  (the  signal 
set)  be  a  subset  of  n-dimensional  Euclidean  space  IZn.  Let  /  : 
S  5  be  a  continuous  mapping.  The  code  C(S,f)  consists 
of  those  bi-infinite  sequences  x  =  . . .  x-i,  xo,  Xu  x2, . . .  £  S 
that  satisfy 

xt  =  f(xt- 1) 

for  all  t  e  Z .  Note  that  the  “future”  of  each  codeword  is 
completely  determined  by  its  “past.” 

At  first  sight,  it  might  seem  that  the  information  rate  (i.e., 
the  number  of  information  bits  per  code  symbol)  of  any  such 
code  must  be  zero.  However,  as  the  example  below  shows,  this 
need  not  be  so  if  S  is  an  infinite  set. 

Throughout  the  paper,  A  will  denote  some  finite  alphabet 
and  B  will  denote  a  subshift  (i.e.,  a  closed,  shift-invariant 
subset)  of  Az .  Let  a  :  Az  — ►  Az  be  the  left  shift  operator. 

Example:  Let  A  =  {0,  l},  let  B  be  any  subshift  of  Az ,  and 
let  p  :  B  — ►  [0, 1]  be  the  mapping 

OS 

. . .  6-i,  6p,  b\ ,  62,  •  ■  ■  ^  btj2  +  . 

t= 0 

We  then  define  a  code  C  as  the  image  of  the  encoding  rule 
B  — * ►  C  :  b  1— +  x  with 

xt  =  el27rp("<(6)). 

Clearly,  the  information  rate  of  C  is  one  bit  per  symbol.  The 
signal  set  5  is  some  subset  of  the  unit  circle.  But 

Xt  =  gl27T-2p(crt  —  1  (6)) 

=  4-i, 

which  shows  that  C  =  C(S,  f)  for  f  \  x  ' — >  x2. 

For  B  =  Az ,  this  example  was  first  presented  in  [1],  where 
it  was  shown  that  the  code  is  a  group  code  (or  geometrically 
uniform  [2])  and  has  a  well  defined  minimum  distance.  It  then 
turned  out  that  this  code  is  actually  a  standard  example  of  a 
chaotic  dynamical  system  [3].  The  related  idea  of  using  chaotic 
systems  to  produce  waveforms  for  communications  had  earlier 
been  proposed  in  [4]. 

The  choice  of  B  =  Az  in  the  example  causes  the  following 
problem:  the  all-ones  information  sequence  and  the  all-zeros 
information  sequence  are  mapped  to  the  same  codeword.  (One 
can  prove  that  some  problem  of  this  type  always  occurs  if  S 
is  connected.)  The  remedy  is  to  restrict  B  to  a  subshift  of  Az 
that  forbids  too  many  consecutive  zeros  (or  ones,  or  both  zeros 
and  ones).  The  resulting  effective  signal  set  5  is  a  fractal  and 
totally  disconnected  (like  the  Cantor  set).  While  this  seems 
odd  at  first  sight,  the  resulting  codes  are  well-behaved  in  every 
respect;  in  particular,  they  can  be  encoded  and  decoded  with 
finite  memory  and  finite-precision  arithmetic. 

It  can  also  be  shown  that  codes  of  this  type  can  have  an  ar¬ 
bitrarily  large  minimum  distance,  which  dispells  any  lingering 
suspicion  that  such  codes  are  somehow  inherently  “bad.” 
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Abstract  —  A  class  of  binary-to-g-ary  convolutional 
codes  is  studied  where  the  operation  performed  in 
the  encoder  is  addition  modulo  g.  For  rate  1  codes 
an  extended  spectrum  for  the  codes  is  defined  and  a 
necessary  and  sufficient  condition  for  the  encoder  to 
be  catastrophic  is  given.  Optimal  codes,  for  relatively 
small  alphabet  size  q  and  memory  size  m,  found  by 
computer  search  are  reported. 

I.  Summary 

Consider  a  rate  1  memory  m  binary-to-g-ary  encoder  where 
the  operations  in  the  encoder  are  performed  over  Zq  (the  ring 
of  integers  mod  q).  The  encoder  input  sequence  is  binary  and 
the  encoder  output  sequence  as  well  as  the  encoder  generator 
coefficients  are  g-ary,  i.e. 

v(D)  —  u(D)g(D)  mod  g, 

where 

V(D)  =  v0  +  V! D  +  v2 D2  +  . . .  vi  £  {0, 1,...  ,g  -  1} 
u(D)  =  u0  -f  U!D  +  u2D2  +  ...  ut  £  {0,1} 

9(D)  =  9o  +  QiD  +  .  *  *  H-  gm-Dm  9i  G  {0,l,...,g  —  1). 

If  the  input  bits  are  viewed  as  indicator  functions,  the  only 
operation  performed  in  the  encoder  is  addition  modulo  g. 
In  comparison  to  the  encoders  reported  in  [1],  the  choice  of 
output  alphabet  is  therefore  less  restrictive.  Since  the  in¬ 
put  alphabet  is  not  a  subfield  of  the  output  alphabet  a  dif¬ 
ferent  approach  must  be  taken  regarding  free  distance  and 
distance  spectrum  for  the  code.  Furthermore,  it  is  possi¬ 
ble  that  a  rate  1  binary-to-g-ary  encoder  is  catastrophic  al¬ 
though  no  input  sequence  of  infinite  Hamming  weight  results 
in  an  output  sequence  with  only  a  finite  number  of  nonzero 
symbols.  An  example  is  the  rate  1  binary-to-6-ary  encoder 
g(D)  =  2  -f  2D  -f  4 D2,  shown  in  Fig.  1. 


For  this  encoder  it  is  easily  seen  that  no  input  sequence  with 
infinite  Hamming  weight  gives  an  output  sequence  with  finite 
Hamming  weight.  However,  the  encoder  is  catastrophic  since 
the  two  (infinite)  input  sequences  u  —  10010111  1001 . . .  and 
u'  =  01111001  0111  ...  result  in  output  sequences  that  only 
differ  in  the  first  position.  Moreover,  the  distance  spectrum  for 
a  binary-to-q-ary  code  can  not  be  defined  in  an  appropriate 
way,  because  different  output  sequences  may  have  different 
distance  spectra. 


In  order  to  “circumvent”  these  difficulties  we  observe  that 
the  difference  between  two  input  sequences,  u  and  v! ,  is  a  vec¬ 
tor  with  elements  £  {  —  1,0,  1}.  The  properties  of  an  encoder 
are  for  this  reason  evaluated  by  use  of  an  extended  input  al¬ 
phabet  with  elements  from  {  —  1,  0, 1},  i.e  the  input  sequence  is 
Uext(D)  =  uq  -\-u\D -\-u2D2  +  ...,  where  Ui  £  {  —  1,0,1}.  The 
corresponding  weight  distribution  of  the  output  sequence  is 
then  evaluated.  For  the  encoder  above  we  find  that  it  is  catas¬ 
trophic  since  the  (infinite)  input  sequence  1  -1  -10-1110 
1  —  1  -10...  results  in  the  output  sequence  2  0  0  0  .... 
The  conditions  for  a  rate  1  binary-to-q-ary  encoder  g( D )  to 
be  catastrophic  can  be  summerized  in 

Theorem  1  A  rate  1  binary-to-q-ary  encoder  is  catastrophic 
if  and  only  if  there  exist  an  integer  N  and  a  sequence 
ujsr(D)such  that  (1  —  D  N)\uN(D)g(D)  mod  q,  where  u^{D)  — 
u0  +  mD  +  .. .  4-  uN_1DN-1)un  £  {-1,0, 1}. 

Applying  the  theorem  to  the  encoder  in  Fig  1,  we  find  that  it 
is  catastrophic  since  for  N  ~  8  and  un(D)  =  \  —  D  -  D2  - 
D*  +  D5  +  D 6  we  have  uN(D)g(D)  =  2  +  4D8  =  2(1  —  Ds) 
mod  g. 

To  find  an  optimum  code  an  “extended  distance  spectrum” 
corresponding  to  input  symbols  from  the  extended  input  al¬ 
phabet  was  calculated  according  to  the  idea  described  in  [2]. 
If  we  let  n(dfree  -f-  i)  denote  the  (i  4-  l)th  spectral  component, 
then  the  codes  found  by  computer  search  are  optimal  in  the 
sense  that  the  free  distance  is  maximal,  i.e.  d/ree  =  m  4-  1, 
and  no  code  exists  such  that,  for  any  Z  =  0,1,2,..., 

ni^dfree.  T  *)  —  nopt  (dfree  +  t)  2  =  0,  1 —  1 

n(df  ree  4”  ®)  <  Tt0pt  (dfree  4~  *)  2  —  /. 

In  Table  1  the  first  three  spectral  components  for  the  found 
optimal  codes,  corresponding  to  distances  m41,  m  +  2  and  m-\- 
3,  are  given. 


memorysize  | 

m  =  1 

m  =  3 

m  =  5 

m  =  7 

q  =  4 

(2,4,8) 

(6,20,92) 

— 

— 

q  =  5 

(2,4,8) 

(4,16,100) 

(8,70,364) 

— 

q  =  6 

(2,4,8) 

(2,12,62) 

(4,16,126) 

(10,42,224) 

q  =  7 

(2,4,8) 

(2,8,32) 

(2,24,64) 

I”  (4,48,184) 

q  =  8 

(2,4,8) 

(2,4,16) 

(2,6,42) 

(2,14,70) 

Table  1*  Best  distance  spectrum  for  some  values  of  mem¬ 
ory  size  m  and  alphabet  size  q.  That  no  code  with 
dfree  =  m  4  1  was  found  is  indicated  by  “ - ” . 
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Abstract  -  The  concept  of  dual  codes  is  formulated 
in  terms  of  characters  and  abelian  groups.  The 
MacWilliams  transform  is  established  under 
general  conditions.  It  is  demonstrated  that  this 
transform  can  naturally  be  regarded  as  a 
partitioning  of  a  fourier  transform. 

Let  A  be  a  finite  abelian  group  and  let  U  =  (  ze  C : 

I  z  I  =1}  be  the  set  of  units  in  the  complex  plane  C . 
Any  homomorphism  cp:  A-h>U  is  an  irreducible 
character  of  A  .  The  set  A  of  all  irreducible 
characters  is  an  abelian  group  isomorphic  to  A 
(Herstein  [1],  p  115).  Let  ¥  :  A-»A  be  a  fixed 
isomorphism  from  A  to  A,  taking  the  element  a 
in  A  to  the  irreducible  character  Ta  in  A . 

By  a  code  we  understand  a  subset  C  in  the  group 
A  .  The  code  is  a  group  code  if  it  is  a  subgroup  in 
A  .  For  any  group  code  C  in  A  we  define  the  dual 

C1  according  to 

C1  =  {  aeA  :  M*a(C)  =  {1} }. 

It  is  easy  to  see  that  C1  is  also  a  group. 

Now  suppose  there  is  a  weight  w(x)  associated 
with  each  element  xe  A.  More  formally,  let 

w:A— >9?+  be  a  map  from  the  finite  group  A  into 

the  set  SR+  of  non-negative  real  numbers.  Denote 
by  W  the  range  of  this  map  and  let  C  be  a  code  in 
A  .  For  any  ue  W  we  define 

Au  =  I  { xe  C  :  w(x)  =  u }  I . 

The  weight  distribution  of  the  code  C  is  A  =  { (u, 
Au)  :ueW}. 

A  bijection  T:  A->A  such  that  w(Tx)  =  w(x) 
holds  for  any  element  x  in  A  is  called  a  weight  - 
preserving  transformation.  If  T  and  S  are  two 
weightpreserving  transformations,  define  the 
product  TS  according  to  TS(x)  =  T(S(x)),  xe  A  . 
It  is  clear  that  the  set  of  all  weight  preserving 
transformations  forms  a  group  under  this 
product.  We  denote  this  group  by  Q  . 

Lemma:  Let  the  characters  4/x  satisfy  4/rpx(Ty) 

=  Tx(y);  x,  ye  A;  Te  Q  .  Under  this  assumption 
there  is  a  function  K :  AxA— »91+  such  that 


Ju(y)  =  I  ju(x)  ¥x(y)  =  K(u,  w(y)),  x,  ye  A, 
xe  A 

where  ju(x)  is  the  indicator  function  for  the 
weight  w  and  where  Ju(y)  is  its  fourier 
transform.  Morever,  under  these  conditions  the 

weight  distribution  A^  for  the  dual  code  C^~  is 
given  by 

A^=  I  K(u,v)Av. 

U  veW 

The  last  relation  is  the  MacWilliams  identity. 
We  address  the  question  under  what  conditions 
this  holds.  One  general  result  is  as  follows. 

Denote  by  91  ^  the  set  of  all  functions  f  :  A-^91 

and  let  L(w)  denote  the  linear  subspace  in  9t^ 
spanned  by  the  functions  { ju:  ue  A  )  .  It  is  clear 

that  this  set  forms  an  orthogonal  basis  in  L(w)  . 
The  set  (Ju:  ue  A  )  is  of  course  also  an  orthogonal 

basis.  We  denote  by  L-*-(w)  the  linear  space 
spanned  by  this  new  basis.  In  general  the  spaces 

L(w)  and  L-^(w)  are  different.  Occasionally, 
however,  they  might  coincide. 

Theorem:  the  MacWilliams  identity  holds  if 

and  only  if  the  weight  w  is  such  that  L(w)  = 
L-*-(w)  . 
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Abstract-  This  paper  investigates  the  performance 
(differential  SNR)  of  two  detectors  for  spread-spectrum 
signals  modeled  as  random  processes  embedded  in  chan¬ 
nel  noise.  Linear  interference  suppression  is  performed  on 
the  multiple-access  interference  prior  to  detection;  thus 
the  noise  in  the  detection  problem  is  comprised  of  col¬ 
ored  noise  and  residual  multiple-access  interference.  It  is 
observed  that  a  non-linear  detector  outperforms  a  purely 
linear  detector. 

1.  Introduction 

A  Code-Division  Multiple- Access  (CDMA)  based  digi¬ 
tal  communication  system  is  considered.  Bandwidth  effi¬ 
ciency,  complexity  and  security  issues  motivate  the  search 
for  schemes  to  integrate  new  user  information  into  central¬ 
ized  demodulators.  In  order  to  accommodate  a  new  user  into 
a  receiver,  its  presence  must  be  detected. 

It  is  straightforward  to  show  that  the  locally  optimum  de¬ 
tector  for  this  detection  problem  optimizes  the  differential 
SNR.  However  the  locally  optimal  detector  is  infeasible  to 
implement;  thus,  simpler,  noise-distribution-independent  de¬ 
tectors  are  pursued.  A  non-linear  detector  is  considered  to 
compensate  for  the  presence  of  the  residual  multiple-access 
interference  (RMAI)  which  is  non-Gaussian  in  nature.  Com¬ 
parison  is  made  with  a  linear  detector  which  is  better  suited 
to  Gaussian  noise. 

2.  The  Detection  Problem 

The  signal  to  be  used  for  detection  of  the  spread-spectrum 
signal  is  a  linear  transformation  of  the  received  signal.  The 
transformation,  V,  is  chosen  to  suppress  multiple-access  in¬ 
terference.  This  gives  rise  to  a  hypothesis  testing  problem 
that  can  be  cast  as  follows: 
ff0  :  x.  — 

#i  :  =  Vn{  +  £•  +  05-  for  i  G  [1,  A7], 

0  is  an  SNR  parameter.  Un-  is  the  ambient  channel  noise 
which  will  be  modeled  as  a  zero-mean,  colored,  additive 
Gaussian  process.  is  the  RMAI.  For  K  existing  users, 
0-  is  drawn  from  one  of  2A  possible  random  vectors  with 
equal  probability.  We  assume  that  the  stochastic  signal  s  •  is 
zero  mean. 

We  examine  detectors  based  on  real-valued  detection 
statistics,  Tx(x),  compared  to  thresholds.  The  differential 
SNR  for  the  random  signal  case  is  defined  as  , 

^  i  l  [lim02^o  A_E0{TAr(x)}j 

^ T)  =  JV— -00  iV  Var0{T,v(x)}  ' 

The  two  detectors  under  study  have  the  following  form: 

if  L 

Tn(x)  -  ^2  V  k )  $(*2*  Jt), 

*  =  1  A:  ”  1 

w'here  $(x)  =  x  for  the  simple  correlator  ( Tsco )  and 

*This  research  was  supported  by  the  U.  S.  Army  Research  Of¬ 
fice  under  Grant  DAAH04-93-G-0219. 


$0*0  =  sgn(x)  for  the  non-linear  polarity  Coincidence  corre¬ 
lator  (Tpcc).  It  can  be  shown  that  the  decision  statistics 
for  these  two  correlators  under  both  hypotheses  are  asymp¬ 
totically  normal  and  hence  justify  the  use  of  the  differential 
SNR  as  a  performance  measure. 

While  one  can  easily  determine  the  differential  SNR  for 
Tsco ,  calculation  of  the  differential  SNR  for  Tpcc  involves 
the  evaluation  of  the  following  probability:  P[x  >  0,  y  >  0] 
for  x  and  y  two  jointly  Gaussian,  correlated  random  variables 
with  non-zero  means.  No  closed  form  expression  exists  for 
this  quantity  [1];  thus  we  bound  this  probability  to  yield  the 
following, 

0  <  |E{sgn(x)sgn(y)}|  <  -arcsin  (  — —  )  . 

7T  yCrxCTy  J 

where  p  =  Cov{x,  y }.  Use  of  these  bounds  yield  upper  and 
lower  bounds  for  the  differential  SNR  of  Tpcc  • 

3.  Performance  Example 

Performance  is  studied  in  the  context  of  a  decorrelator 
[2]  based  multi-user  receiver  system.  It  is  assumed  that  the 
spreading  codes  are  mismatched  between  the  mobile  trans¬ 
mitters  and  the  receiver  (e.g.  due  to  multi-path)  thus  RMAI 
will  be  present.  We  consider  an  environment  where  the  ab¬ 
solute  value  of  the  cross-correlation  between  the  signature 
sequences  is  increased  by  0  <  (  <  1,  and  the  auto-correlation 
is  decreased  by  £;  £  captures  the  worst-case  mismatch  due 
to  propagation  effects.  It  is  clear  from  Figure  1  that  the 
non-linear  detector  maintains  a  distinct  advantage  over  the 
linear  one.  The  performance  of  a  multi-user  system  based  on 
conventional  matched  filter  receivers  is  also  studied,  but  not 
presented  here. 
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Figure  1:  Asymptotic  Relative  Efficiencies  between  Tpcc  (lower 

bound)  and  Tsco  for  10  users  (length  31  codes). 
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I.  Introduction 

Minimum  Mean  Squared  Error  (MMSE)  demodulation  for 
direct-sequence  CDMA  systems  [1]  eliminates  the  near-far 
problem,  and  can  be  implemented  adaptively  (i.e.,  without 
explicit  knowledge  of  the  parameters  of  the  multiple-access  in¬ 
terference),  given  a  training  sequence  for  the  desired  transmis¬ 
sion.  However,  prior  to  timing  acquisition ,  the  receiver  does 
not  know  the  phase  of  the  training  sequence,  i.e.,  it  does  not 
know,  for  a  given  observation  interval,  which  bit  of  the  train¬ 
ing  sequence  contributes  the  most  signal  energy.  Conceivably, 
this  timing  information  could  be  obtained  using  conventional 
acquisition  techniques  by  correlating  over  long  enough  inter¬ 
vals  and  applying  enough  power  control  to  resolve  the  near-far 
problem.  In  this  paper,  however,  we  present  an  adaptive  ap¬ 
proach  to  the  problem  of  near-far  resistant  joint  acquisition 
and  demodulation. 

Our  method  is  to  use  a  training  sequence  with  a  short  pe¬ 
riod  P,  and  run  P  adaptive  algorithms  either  serially  or  in 
parallel,  one  for  each  assumed  phase  of  the  training  sequence. 
The  adaptive  algorithm  that  yields  the  least  Mean  Squared 
Error  (MSE)  corresponds  to  the  correct  phase,  and  yields  in 
addition  an  MMSE  correlator  that  can  be  used  for  continued 
training  or  for  decision-directed  adaptation.  Thus,  acquisition 
results  in  a  near-far  resistant  demodulator  that  implicitly  ac¬ 
counts  for  the  timings  and  amplitudes  of  all  the  transmissions 
without  explicitly  estimating  even  the  timing  of  the  desired 
signal.  (Estimates  of  the  latter  can  be  derived  from  the  result¬ 
ing  MMSE  demodulator  if  required.)  We  note  that  a  method 
for  joint  acquisition  and  demodulation  that  does  not  require 
a  training  sequence  has  also  been  devised  [2]. 

II.  Model  and  Algorithm 

We  consider  an  equivalent  synchronous  model  for  an  asyn¬ 
chronous  direct-sequence  CDMA  system,  obtained  by  chip 
matched  filtering,  sampling  at  (a  multiple  of)  the  chip  rate, 
and  restricting  attention  to  a  finite  observation  interval  for 
each  bit  decision.  The  received  vector  rn  £  Ttd  used  for  the 
nth  bit  decision  is  given  by 

J 

r„  =  60[n]u0  +  y^6j[rj]uj  +  w„  (1) 

j- 1 

where  Uo  is  the  vector  modulating  the  desired  bit  6o[n],  and, 
for  1  <  j  <  J,  bj[n]  are  interfering  bits  due  to  intersymbol 
interference  and  multiple-access  interference,  Uj  are  interfer¬ 
ence  vectors  modulating  these  bits,  and  wn  is  additive  white 
Gaussian  noise.  The  received  vector  for  subsequent  bits  are 
obtained  by  sliding  the  observation  interval  by  T,  where  T  is 
the  bit  interval.  The  vectors  are  linear  combinations  of 
shifts  of  the  spreading  sequences  used  by  the  various  trans¬ 
missions;  we  do  not  assume  knowledge  of  these  vectors.  Our 

^his  work  was  supported  in  part  by  funds  from  the  University 
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objective  is  to  arrive  at  a  linear  receiver  that  provides  a  bit 
estimate  60[ra]  =  sgn(cTrn),  where  c  is  chosen  to  minimize  the 
MSE  E[(cTrn  -60[n])2]. 

The  desired  transmission  sends  a  periodic  sequence  (period 
P)  of  training  bits  t[n].  We  consider  an  observation  interval  of 
at  least  2 T,  so  that  one  bit  of  the  desired  transmission  must 
fall  completely  within  it.  Letting  b0[n]  denote  this  bit,  we  must 
have  60[n]  =  t[n  +  km]  for  some  unknown  integer  k*  between  0 
and  P  -  1.  Since  the  phase  k*  of  the  training  sequence  is  not 
known  while  in  acquisition  mode,  we  run  P  adaptive  MMSE 
demodulators,  each  corresponding  to  one  of  the  following  hy¬ 
potheses  about  the  phase  of  the  training  sequence: 

Hu  6<>[n]  =  t[n  +  »],  i  =  0, 1, ....  P  -  1  (2) 

For  example,  under  a  least  squares  implementation  of  this 
algorithm  spanning  M  observation  intervals,  the  correlator 
for  the  zth  hypothesis  is  given  by 

c;  =  R_1u(i)  (3) 

where  R  =  (1/M)  Y2n=i  r«r?i  is  the  empirical  crosscorrelation 
matrix  for  the  received  vector,  and 

M 

U(t)  =  (1/M)  t[n  +  (Irn 

n=  1 

is  the  estimate  of  the  desired  signal  vector  Uo  under  hypoth¬ 
esis  Hi.  The  estimated  MSE  under  hypothesis  Ht  is  given  by 
r}t  =  1  —  c[ul.  The  best  hypothesis  is  the  one  with  the  small¬ 
est  estimated  MSE,  and  the  corresponding  correlator  c,  is  a 
near-far  resistant  demodulator  by  virtue  of  the  near-far  resis¬ 
tance  of  the  MMSE  demodulator  [1].  Good  hypotheses  can  be 
combined  to  further  enhance  performance.  This  method  relies 
on  the  training  sequence  having  good  periodic  autocorrelation, 
and  on  the  data  bits  for  the  interfering  transmissions  being  un¬ 
correlated  with  those  of  the  desired  transmission.  If  multiple 
transmissions  are  being  simultaneously  acquired,  their  train¬ 
ing  sequences  should  have  good  periodic  crosscorrelations. 

In  the  conference  presentation,  we  will  (a)  show  via  simu¬ 
lation  of  the  least  squares  algorithm  that  a  near-far  resistant 
demodulator  is  obtained  after  a  very  small  number  of  itera¬ 
tions,  (b)  provide  an  approximate  analysis  of  the  effect  of  least 
squares  estimation  errors  on  acquistion  performance  (i.e.,  on 
the  probability  of  choosing  the  wrong  hypothesis),  and  (c) 
comment  on  directions  for  future  research. 
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We  present  a  blind  adaptive  interference  suppres¬ 
sion  algorithm  for  Direct-Sequence  Code-Division  Multi¬ 
ple-Access,  which  is  based  on  the  Minimum  Mean 
Squared  Error  (MMSE)  criterion.  The  algorithm  is  blind 
in  the  sense  that  it  does  not  require  a  training  sequence, 
although  it  does  require  (approximate)  knowledge  of  the 
user  spreading  waveform  and  associated  timing.  The 
algorithm  is  related  to  the  blind  interference  suppression 
algorithm  presented  in  [1],  and  assumes  that  the  MMSE 
filter  is  expressed  as  the  sum  of  two  orthogonal  compo¬ 
nents:  the  matched  filter  (referred  to  as  the  anchor )  and  an 
adaptive  filter.  However,  instead  of  using  the  minimum 
variance  (MV)  criterion,  as  in  [1],  we  consider  an  alterna¬ 
tive  cost  function  which  is  closer  to  the  actual  MSE.  This 
cost  criterion  was  proposed  by  Sato  and  Godard  [2]  for 
blind  equalization  of  a  single-user  channel.  However, 
without  the  orthogonal  decomposition  presented  in  [1], 
this  cost  function  is  not  suitable  for  the  multi-user  applica¬ 
tion  due  to  the  presence  of  a  local  minimum  associated 
with  each  user. 

The  orthogonally  anchored  Sato  cost  function  leads 
to  a  stochastic  gradient  (or  least  squares)  algorithm  that 
has  the  following  advantages  relative  to  the  MV  algo¬ 
rithms  in  [1]: 

•  The  algorithm  is  insensitive  to  mismatch  between 
the  anchor  and  desired  signal. 

•  Multipath  components  within  the  window  spanned 
by  the  filter  are  coherently  combined. 

•  The  stochastic  gradient  algorithm  produces  (much) 
less  asymptotic  MSE  than  the  MV  stochastic  gradi¬ 
ent  algorithm  for  the  same  speed  of  convergence. 

A  disadvantage  associated  with  this  cost  function  is 
that  there  is  a  local  minimum  associated  with  each  user. 
However,  if  the  crosscorrelation  between  any  pair  of  pulse 
shapes  is  small,  then  the  orthogonal  anchor  ensures  that 
the  norm  of  the  coefficient  vector  that  achieves  any  of 
these  local  minima  must  be  very  large.  These  local  min¬ 
ima  can  therefore  be  excluded  by  an  appropriate  norm 
constraint  on  the  vector  of  filter  coefficients. 

Orthogonally  Anchored  Adaptive  Algorithm 

Consider  a  synchronous  DS-CDMA  system  where 
the  vector  of  received  samples  corresponding  to  the  ith 


transmitted  bit  at  the  output  of  the  chip  matched  filter  is 
given  by 

K 

rM  =  X  bk[i]Aksk  +  n[i]  (1) 

k=  1 

where  K  is  the  number  of  users,  r  has  N  components,  N 
being  the  processing  gain,  [bk[i]}  is  the  sequence  of 
binary  symbols  corresponding  to  user  k,  sk  is  the  spread¬ 
ing  code  for  user  k  where  ils*ll  =  1,  Ak  is  the  amplitude  for 
user  k ,  and  n  is  a  noise  vector. 

The  linear  MMSE  detector  for  user  1  consists  of  the 
coefficient  vector  that  minimizes  E[(bi[i]  -c/r[7])2]. 
To  obtain  the  blind  algorithm  c1  is  constrained  to  be  of  the 
form  c/  =  S\  +  Wj  where  S\  is  an  estimate  of  Sj,  and 
(wj/sj  =0.  Ci  is  then  chosen  to  minimize  the  Sato  cost 
function 


^(ci)  = 


-sgn  (c/r [/]) 


where  sgn  (x)  =  x/\x\. 


(2) 


A  stochastic  gradient  algorithm  that  minimizes  (2), 
subject  to  the  preceding  orthogonal  decomposition,  is 
given  by 


w[i]  =  w[i  -  1]  -  M*]|r[/]  -  (r'fiJsOs!  J  (3) 

where  e[i]  =  c/r [i]  -  sgn  (c/r[ i]),  and  ju  is  the  step-size. 
(The  MV  stochastic  gradient  algorithm  presented  in  [1] 
simply  replaces  e[i ]  by  the  output  sample  c/r[/].)  A  least 
squares  adaptive  algorithm  based  on  the  preceding 
approach  is  easily  derived.  Numerical  examples  compar¬ 
ing  the  performance  of  these  algorithms  with  the  MV 
algorithm  in  [1],  and  with  the  conventional  LMS  algo¬ 
rithm  will  be  presented  at  the  conference. 
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I.  INTRODUCTION 

During  the  last  few  years  there  has  been  much  work  on  multi-user 
detection  (MUD)  for  Direct  Sequence  Code  Division  Multiple 
Access  (DS/CDMA)  systems,  and  several  solutions  have  been 
presented  [1],  Another  active  field  of  research  considers  methods 
for  combined  source  and  channel  coding  in  vector  quantization 
(VQ)  [2].  The  present  paper  combines  these  two  areas.  We  present 
a  method  for  robust  transmission  of  VQ-data  over  a  CDMA 
channel.  Our  approach  differs  from  most  prior  work  in  two  ways: 

(1)  The  decorrelation  of  the  users  and  the  decoding  of  the  VQs  are 
carried  out  simultaneously;  (2)  The  decoding  is  based  on  the 
unquantized  matched  filter  outputs,  and  no  binary  decisions  are 
taken.  We  use  the  term  soft  decoding  to  emphasize  this  latter  fact. 
Thus,  our  approach  considers  estimation  based  rather  than 
detection  based  decoding  of  the  channel  and  the  VQs.  Similar 
studies  for  single-user  channels  can  be  found  in  [3],  [4],  and  [5]. 

II.  System  model 

Consider  a  symbol  synchronous  CDMA  system  with  K  users.  User 
k  produces  a  sample  vector  X*,  which  is  encoded  into  an  index  Ik 
by  the  VQ  encoder  of  user  k.  The  index  is  thereafter  converted  into 
a  block  b(/J  of  L  bits  in  polar  format  {±1}.  For  simplicity  we 
assume  that  all  users  have  the  same  block  length  L.  The  bits  are 
transmitted  one  by  one  on  a  CDMA  channel  that  is  distorted  by 
AWGN.  Thus,  the  matched  filter  outputs  of  the  received  signal  at 
time  n  can  be  expressed  as,  (c.f.  [1]),  Y;i  =  RW  *b;(  +  N,  where  R 
is  the  cross-correlation  matrix  between  the  different  spreading 
codes  of  the  users  and  W  =  diagCwj,...,^)  is  the  amplitude  matrix, 
where  wk  denotes  the  amplitude  of  user  k.  All  user  bits  at  time  n 
are  represented  by  the  vector  bf)  =  (bn(Ix),...,bn(IK))T  where  bn(lk) 
denotes  the  nth  bit  of  user  k.  The  channel  noise  vector  N  is  white 
and  composed  of  Gaussian  zero  mean  variables  with  variances  <7  . 

III.  Optimal  soft  Decoding 

For  decoder  design  we  adopt  the  minimum  mean  square  error 
(MMSE)  criterion.  That  is,  the  decoder  X*(Y),  for  user  k ,  is 
designed  to  minimize  E\\Xk -XA(Y)||2,  where  Y  =  (Yp..., YL ) 
denotes  the  matrix  of  matched  filter  outputs.  The  main  result  of  this 
paper  is  a  formulation  of  the  MMSE  decoder  based  on  estimates, 
bk( yfl)  =  tanh((J~2{(RW)ryJD’  of  the  individual  bits,  b„(Ik). 
Here,  {a}A  denotes  the  kxh  element  of  the  vector  a.  The  derivation 
is  based  on  the  Hadamard  transform  description  of  a  VQ  [3].  To 
treat  this  in  some  more  detail,  let  3  be  the  super-index  defined  by 
the  binary  forms  of  all  the  VQ  encoder  outputs,  Ik,  such  that  user  1 
defines  the  L  least  significant  bits  and  user  K  the  L  most  significant 
bits  of  3.  Also  let  c3  =  [(c(,;))r,...,(<))rf  denote  the  vector 
composed  by  the  centroids,  cj-*’  =  E[Xk\Ik  =  i],  of  the  VQs.  Then 
c3  can  be  described  as  c3  =  Th3 ,  where  h3  is  the  3th  column  of  a 
size  2kl  Hadamard  matrix  and  T  is  a  sparse  transform  matrix  (c.f. 
[3]).  It  is  easy  to  show  that  the  MMSE  estimate  of  the  input  vectors 
of  all  users  is  X(y)  =  Th(y),  where  h(y)  =  £[h3|Y  =  y].  This  leads 
to  the  expression  R 

X(y)  =  [(X,  (y))r,...,(XJP(y))r]r  =  T  •  •  p(y) 

for  the  MMSE  decoder.  Here  Rhh  =  E[h3h5  ■  /(3)]  and 
mh  =  £[hs  •  /(3)] ,  where  /(3)  =  expC^cr'r'XlIRWbJI2). 


Furthermore,  the  bit-estimates,  bk,  enter  as  p(y)  =  P*  ®  pP 
where  p,  =  (l,4(yJL))7’  ®  -  -  *  ®  (C  4  (Yi))r  -  Here^®  denotes  the 
Kronecker  matrix  product.  Thus,  the  vector  p(y)  consists  of 
products  of  bit-estimates,  bk( y,(),  for  all  users  at  all  different  times. 
We  name  our  decoder  the  Soft  Multi-User  Decoder  (SMUD).  The 
SMUD  performs,  as  noted  above,  combined  MSE-optimal  user 
decorrelation  and  VQ  decoding.  Note  that,  in  the  decoder 
expression,  only  the  vector  p(y)  depends  on  the  received  signal  y. 
Furthermore,  note  that  the  expectations  in  the  expressions  for  Rhh 
and  mh  are  taken  over  the  statistics  of  the  VQ  indices.  Thus,  the  a- 
priori  index  information  is  confined  to  these  quantities.  Since  the 
SMUD  is  MSE-optimal  it  shows  how  to  utilize  the  a-priori 
information  in  an  optimal  fashion  where  Rhh  and  mh  are  used  to 
modify  the  statistic  p(y)  to  account  for  the  source  statistics.  This  is 
in  contrast  to  systems  where  VQ  decoding  is  based  on  an  ML- 
decision,  not  taking  the  source  statistics  into  account.  Note  also 
that,  since  the  Hadamard  transform  is  a  fast  transform,  the 
calculation  of  h(y)  from  the  received  signal,  y,  can  be  carried  out 
using  an  order  of  KL  •  2KL  operations  [6]. 

IV.  NUMERICAL  SIMULATIONS 
We  have  compared  the  SMUD  to  the  Maximum  Likelihood  Multi- 
User  Decoder  (ML-MUD)  [1]  in  combination  with  table-look-up 
VQ  decoding  on  a  CDMA  system  with  2  users  having  the  same 
transmission  energy  (w2  —  w^)-  The  cross-correlation  between  the 
users  is  0.7.  A  VQ  trained  for  a  first  order  Gauss-Markov  source 
with  correlation  p  =  0.9  was  utilized  for  both  users,  and  we  used 
the  sample  vector  dimension  6  and  the  block  length  L- 6  bits.  The 
performance  measure  is  the  signal-to-noise  ratio  (SNR)  at  the 
output  of  the  decoder  at  a  given  Channel-SNR  (CSNR),  w2k  l  cr . 
The  performance  of  the  decoders  is  shown  in  the  table  below.  As 
seen  the  SMUD  outperforms  the  ML-MUD  with  more  than  3  dB  at 
low  CSNRs.  We  have  also  observed  that  the  performance  gain 
increases  with  increasing  cross-correlation  between  users,  lower 
CSNRs,  and  lower  VQ  output  entropies.  Furthermore  near-far 
resistance  for  the  SMUD  has  been  concluded  from  simulations. 


CSNR  (dB) 

-1 

3 

7 

11 

15 

SMUD  (dB) 

3.10 

4.78 

6.53 

8.34 

10.32 

ML-MUD  (dB) 

-0.32 

1.32 

3.43 

6.33 

9.94 
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Abstract  —  We  consider  Code-division  multiple- 
access  (CDMA)  systems  with  continuous  phase  mod¬ 
ulation  (CPM).  In  particular,  two  multiuser  detec¬ 
tion  algorithms  with  linear  computational  complexity 
are  proposed  for  a  synchronous  system.  We  demon¬ 
strate  that  the  choice  of  an  appropriate  set  of  deci¬ 
sion  statistics  is  crucial  for  detection  and  we  derive 
an  efficient  representation.  The  analysis  is  performed 
for  two  signal  formats  which  exhibit  different  spec¬ 
tral  and  error  rate  characteristics.  We  determine  the 
code  design  that  maximizes  the  minimum  Euclidean 
distance  and  show  that  the  resulting  CPM/ CDMA 
signals  can  achieve  significant  performance  improve¬ 
ments  ov  3r  conventional  CDMA  signals. 

I.  Signal  Model 

CPM/CDMA  signals  axe  an  attractive  choice  for  communica¬ 
tions  over  predominantly  bandwidth  and  power  limited  chan¬ 
nels  since  they  combine  the  merits  of  both  techniques.  In 
particular,  CDMA  offers  a  series  of  desirable  properties  that 
include  increased  capacity,  inherent  diversity  against  multi- 
path  fading  and  the  ability  to  coexist  with  narrowband  inter¬ 
ference.  CPM  provides  signals  with  compact  spectral  char¬ 
acteristics  that  maintain  a  constant  envelope  and  hence  are 
immune  to  nonlinear  distortions  and  easily  amplified  [l]. 

We  consider  a  CPM/CDMA  system  with  K  active  users. 
The  kth  transmitted  signal  is  given  as 

Sk(t,bk)  =  y/2wk  cos  (2?r fct  +  0(f,b*,  c k,h)  +  0k,o)  (1) 

where  b*  =  (. . . ,  6*(— 1),  6*(0),  6fc(l), . . .)  is  the  transmitted 
data  sequence  with  bk(m)  G  {  —  1,  l),  c k  =  (c*(l), . . . ,  Ck(Nc)) 
is  the  spreading  code  of  length  Nc  with  c*(n)  G  {-1, 1},  and 
h  is  the  modulation  index  [1].  The  signal  power  is  Wk ,  the 
carrier  frequency  is  fc  and  9k,o  is  an  arbitrary  constant  initial 
phase.  The  phase  function  0(t,  bk,  c*,  h)  contains  all  the  in¬ 
formation  and  its  construction  defines  the  signal  format.  The 
first  format  examined  in  this  paper  is  similar  to  conventional 
CDMA  in  the  sense  that  only  one  code  is  assigned  to  each 
user.  Under  that  scenario  however,  only  modest  gains  can  be 
achieved  in  the  error  rate  performance  relative  to  conventional 
CDMA  [2].  Another  format  introduced  in  [2]  for  a  memoryless 
CPM/CDMA  system,  considers  the  case  where  each  user  has 
available  a  distinct  pair  of  codes.  The  code  that  is  used  is  de¬ 
termined  by  the  transmitted  information  bit  and  the  objective 
is  to  minimize  the  error  probability.  We  discuss  how  to  con¬ 
struct  such  codes,  depending  on  the  CPM  parameterization, 
and  provide  the  lower  bound  for  the  error  rate. 

II.  Multiuser  Detection 

We  assume  that  all  users  employ  the  same  type  of  CPM  [1], 
and  that  the  signals  are  transmitted  in  a  synchronous,  addi¬ 
tive  white  Gaussian  noise  channel.  Similarly  to  conventional 

1This  work  is  supported  by  the  Advanced  Technology  Program 
of  the  Texas  Higher  Education  Coordinating  Board  under  Grant 
003604-018 


CDMA,  the  complexity  of  the  optimum  detector  increases  ex¬ 
ponentially  with  the  number  of  users  and  the  number  of  trellis 
states.  Clearly,  the  implementation  of  the  optimum  detector 
is  impractical  and  this  motivates  the  need  to  develop  linear 
multiuser  detectors.  To  achieve  linear  complexity  and  near¬ 
optimum  performance,  a  suboptimum  detector  must  decou¬ 
ple  the  multiuser  detection  problem  and  subsequently  perform 
single-user  detection  by  individually  recovering  the  metrics  of 
each  user.  The  optimal  single-user  detector  can  then  be  recur¬ 
sively  implemented  using  the  Viterbi  algorithm.  For  single- 
user  detection,  each  path  metric  becomes  equivalent  to  the 
correlation  between  the  received  signal  and  the  corresponding 
estimated  transmitted  signal.  Denoting  by  9i(m)  the  ith  trellis 
state  during  the  mth  bit  interval,  the  branch  metric  of  the  kth 
user  that  is  associated  with  the  transmission  of  bk(m)  —  ±1 
from  the  9i(m)  state  is  given  as 
K 

Lk(bk{m)i9i(m))  =  Lk,j(bk(m ),  9l(m))  +  n(m)  (2) 

j=i 

where  Lk,k(bk(m),  0»(m))  is  the  metric  of  the  desired  signal, 
Lk,j(bk(m),9t(m)),  j  ^  k ,  are  the  interference  metric  com¬ 
ponents  and  n(m)  is  zero- mean  Gaussian  noise.  Naturally, 
the  objective  of  the  multiuser  detector  is  to  remove  the  in¬ 
terference  component  from  the  metrics  of  the  desired  user. 
However,  an  attempt  to  directly  evaluate  the  effects  of  the 
interference  on  the  metrics  of  each  user  leads  to  prohibitively 
complex  expressions  for  the  decision  statistics  and  an  alterna¬ 
tive  approach  is  necessary.  We  prove  that  the  decision  statis¬ 
tics  can  be  considerably  simplified  if  they  are  expressed  in 
terms  of  the  difference  and  the  sum  of  the  two  branch  met¬ 
rics  that  emanate  from  a  common  trellis  state.  That  linear 
transformation  reduces  the  complexity  of  the  multiuser  detec¬ 
tor  while  preserving  the  metric  information  required  by  the 
Viterbi  algorithm. 

We  propose  two  linear  multiuser  detection  algorithms  that 
are  based  on  properties  of  the  decision  statistics  and  uti¬ 
lize  concepts  applied  in  multiuser  detection  of  conventional 
CDMA  signals.  Both  algorithms  achieve  near-optimum  per¬ 
formance  and  can  be  employed  for  either  signal  format.  We 
derive  the  conditions  that  maximize  the  minimum  Euclidean 
distance  and  evaluate  the  optimum  performance  which  can 
exhibit  a  gain  that  approaches  3  dB  over  binary  antipodal 
signaling.  The  strict  dependence  between  spectral  and  error 
probability  performance  that  exists  in  typical  CPM  signals  is 
largely  decoupled  and  CPM/CDMA  signals  allow  considerable 
flexibility  in  selecting  a  parameterization  that  satisfies  certain 
spectral,  error  rate  and  complexity  constraints. 
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Abstract  —  In  this  paper  a  near  ideal  noise  whiten¬ 
ing  filter  for  a  time- varying  CDMA  system  is  consid¬ 
ered.  The  structure  of  the  ideal  noise  whitening  filter 
is  studied  and  the  metric  function  for  tree  search  de¬ 
tection  is  derived.  The  ideal  noise  whitening  filter  for 
a  time-varying  CDMA  system  depends  on  unknown, 
future  system  parameters  and  is  therefore  difficult  to 
realize.  A  near  ideal,  realizable  noise  whitening  filter 
is  proposed  as  a  solution. 

I.  Introduction 

Recently  joint  multiuser  detection,  in  which  the  multiuser  in¬ 
terference  is  treated  as  a  part  of  the  information  rather  than 
noise,  has  attracted  much  attention.  The  work  of  Verdu  [1] 
has  shown  that  optimum  near-far  resistance  and  a  significant 
performance  improvement  over  the  conventional  detector  is 
achieved  by  an  optimum  maximum  likelihood  multiuser  de¬ 
tector.  The  substantial  improvements,  however,  are  obtained 
at  the  expense  of  a  dramatic  increase  in  computational  com¬ 
plexity.  The  complexity  grows  exponentially  with  the  number 
of  users.  Thus,  when  the  number  of  users  is  large,  the  op¬ 
timum  detector  becomes  infeasible.  It  is  therefore  desirable 
to  use  a  near  optimum,  low  complexity  detector  for  CDMA 
systems  with  a  large  number  of  users.  Many  low  complex¬ 
ity  multiuser  detectors  have  been  proposed  (see  references  in 
[2]).  Sub-optimal  tree  search  algorithms  such  as  sequential 
detection  and  the  M-algorithm  are  especially  promising.  The 
IDDFD  detector  suggested  by  Wei  and  Schlegel  [3]  is  essen¬ 
tially  the  M-algorithm  applied  over  all  users  in  a  given  time 
slot. 

II.  M- Algorithm  Detection 

Wei  et  al.  [2]  have  shown  that,  in  contrast  to  the  case  of  the 
optimum  multiuser  detector,  the  choice  of  the  receiver  filter 
severely  influences  the  performance  of  sub-optimum  multiuser 
detectors.  Detectors  based  on  the  M-  or  the  T-algorithms  and 
a  noise  whitening  receiver  filter  generally  perform  better  than 
similar  detectors  using  only  the  matched  filter.  The  M-  and 
T-algorithm  detectors  based  on  noise  whitening  filter  outputs 
can  achieve  near  optimum  performance  at  a  very  low  complex¬ 
ity  compared  to  the  optimum  detector.  The  M-algorithm  can 
easily  be  applied  to  a  time-invariant,  asynchronous  CDMA 
system,  assuming  that  the  noise  whitening  filter  exists.  In  a 
practical  system  the  noise  whitening  filter  is  related  to  time- 
varying  system  parameters.  Time  variations  such  as  arrival 
and  departure  of  users,  random  signature  waveforms,  and  mul¬ 
tipath  effects  make  it  necessary  to  derive  the  noise  whiten¬ 
ing  filter  following  each  system  change.  Wijayasuriya  et  al. 

1  This  work  was  supported  in  parts  by  Telecom  Australia  under 
Contract  No.  7368  and  by  the  Commonwealth  of  Australia  under 
International  S  &;  T  Grant  No.  56.  The  results  of  this  work  form 
parts  of  Australian  Provisional  Patent  Application  No.  P M9548/ 94 . 


[4]  have  suggested  the  sliding  window  decorrelating  receiver 
in.  However,  the  derivation  of  adaptive  filters  is  not  easily 
accommodated  using  this  technique.  In  the  control  theory 
area,  a  factorization  method  has  been  suggested  by  Youla  and 
Kazanjian  [5].  An  alternative  method  has  been  suggested  by 
Alexander  and  Rasmussen  to  factorize  the  CDMA  multiuser 
channel  [6]. 

III.  Near  ideal  filter 

In  this  paper,  we  show  that  the  method  of  Youla  and  Kazan¬ 
jian  can  be  generalized  to  derive  a  near  ideal  noise  whitening 
filter  for  a  time- varying  asynchronous  CDMA  system.  The 
structure  of  the  ideal  noise  whitening  filter  is  studied  and 
the  metric  function  for  the  M-algorithm  based  on  the  ideal 
noise  whitening  filter  is  derived.  A  near  ideal,  realizable  noise 
whitening  filter  is  then  introduced.  The  convergence  of  the 
factorization  method  for  a  time- varying  CDMA  system  is  con¬ 
sidered.  The  truncation  of  the  number  of  taps  of  the  ideal 
noise  whitening  filter  is  studied  and  the  metric  function  for 
the  M-algorithm  based  on  the  near  ideal  noise  whitening  fil¬ 
ter  is  formulated.  Simulation  results  are  obtained  for  5,  7 
and  10-user  time-varying  CDMA  systems  with  binary  ran¬ 
dom  signature  sequences  of  length  10  and  a  rectangular  chip 
waveform.  The  results  show  that  the  near  ideal  noise  whiten¬ 
ing  filter  can  accurately  approximate  the  ideal  noise  whitening 
filter  at  a  low  complexity  level.  The  performance  degradation 
of  a  time-varying,  asynchronous  CDMA  system  using  a  typi¬ 
cal  near  ideal  noise  whitening  filter  is  minimal  compared  to  a 
system  using  the  ideal  noise  whitening  filter. 
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SUMMARY:  Over  the  past  few  years  research  into  multi¬ 
user  receivers  for  code-division  multiple-access  (CDMA) 
networks  has  become  increasingly  more  popular.  Multi¬ 
user  detectors  treat  the  interference  from  other  users  in 
the  same  frequency  band  as  an  information  bearing  part 
of  the  signal,  rather  than  as  noise. 

It  is  known  that  optimal  near-far  resistance  and  sig¬ 
nificant  performance  improvements  can  be  achieved  by 
an  optimal  multi-user  detector  [1].  These  improvements, 
however,  are  achieved  at  the  expense  of  a  dramatic  in¬ 
crease  in  computational  complexity,  which  grows  expo¬ 
nentially  with  the  number  of  users,  making  the  optimum 
detector  an  unachievable  theoretical  concept.  It  becomes 
desirable  to  use  near-optimum,  low  complexity  detectors 
instead,  and  a  number  of  sub-optimal  approaches  to  the 
detection  problem  have  been  studied  in  detail.  Surpris¬ 
ingly,  rear-optimal  performance  for  uncoded  CDMA  can 
be  achieved  with  a  non-linear  tree-search  detector,  whose 
complexity  increases  only  linearly  in  the  number  of  users 
[2]. 

While  all  these  studies  were  undertaken  for  uncoded 
systems,  the  application  of  forward  error  control  coding 
(FEC)  to  improve  performance  and  system  capacity  re¬ 
mains  a  largely  open  area  of  future  research.  In  this 
paper  we  study  a  very  promising  detector  structure  for 
coded  CDMA,  termed  projection  receiver  (PR),  whose 
structure  is  built  on  the  decorrelating  detector  [3].  In 
the  PR  the  effect  of  the  interfering  users  is  accounted  for 
by  metric  adjustments,  upon  which  the  error  control  de¬ 
coder  operates.  The  actual  interference  resolution  has  a 
complexity  which  is  proportional  to  the  number  of  users, 
and  all  that  is  required  are  single  user  error  control  de¬ 
coders. 

The  simple  addition  of  an  FEC  system  to  an  un¬ 
coded  multi-user  receiver  may  not  lead  to  the  best  per¬ 
formance.  This  is  evidenced  in  the  performance  plots 
shown  in  Figure  1,  where  three  systems  are  compared  for 
synchronous  CDMA  using  length  N  =  31  random  signa¬ 
ture  sequences  and  64-state  convolutional  error  control 
codes.  Application  of  FEC  to  the  conventional  detector 
(correlation  detector)  leads  to  the  poorest  performance. 
FEC  and  decorrelation  works  better,  but  the  PR  per¬ 
forms  best.  It  virtually  achieves  the  single  user  bound, 
which  is  the  theoretical  performance  limit  for  multi-user 
CMDA,  for  an  arbitrary  number  of  users  up  to  a  fully 


loaded  system.  That  is,  the  PR  effectively  eliminates 
multi-user  interference. 


Figure  1:  Performance  of  coded  CDMA  multi-user 
systems  as  a  function  of  system  load. 

The  projection  receiver  linearly  projects  the  effects 
of  the  interfering  users  onto  the  complement  of  the  sub¬ 
space  spanned  by  those  users.  In  effect  the  PR  decorre¬ 
lates  the  unwanted  users.  From  this  an  adjusted  metric 
results,  which  has  the  form  of  a  diversity  metric,  and 
is  used  in  the  FEC  decoder  of  the  desired  user.  As 
evidenced  by  Figure  1,  this  approach  achieves  single- 
user  performance  on  an  additive  white  Gaussian  noise 
(AWGN)  channel.  In  this  presentation  we  will  present 
the  theory  of  the  PR,  performance  results  and  discuss 
adaptive  implementations  of  the  detector  which  are  suit¬ 
able  for  VLSI  implementations. 
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Abstract  —  For  transmission  of  speech  or  multime¬ 
dia  information  in  a  time  varying  mobile  channel  fixed 
rate  codes  are  normally  used  designed  for  average  or 
worst  channel  conditions.  However,  fixed  rate  codes 
fail  to  explore  the  time  varying  nature  of  the  mobile 
channel.  In  this  report  we  propose  an  adaptive  multi¬ 
level  coding  scheme  for  Code  Division  Multiple  Access 
(CDMA)  which  is  associated  with  co-channel  interfer¬ 
ence  (CCI)  cancellation  to  explore  the  time  varying 
nature  of  the  radio  link. 

I.  Introduction 

For  efficient  usage  of  available  spectrum  and  to  explore 
the  time  varying  nature  of  the  mobile  radio  link,  adap¬ 
tive  coding/modulation  (codulation)  scheme  may  be  em¬ 
ployed  [1]  [2].  In  this  paper,  which  is  essentially  the  ex¬ 
tension  of  our  previous  work  [3],  we  took  into  account  CSI 
and  propose  an  adaptive  multilevel  coding  scheme  associ¬ 
ated  with  multi  user  interference  cancellation  for  CDMA, 
which  yields  significant  performance  improvement. 

II.  System  Model 

The  information  stream  of  each  user  is  stored  in  a  buffer 
prior  to  transmission  from  where  informations  are  sent 
adaptively  according  to  the  channel  condition.  Adapta¬ 
tion  can  be  done  symbol  by  symbol  or  block  by  block 
according  to  CSI.  We  assume  both  transmitter  and  re¬ 
ceiver  can  sense  any  change  in  channel  condition  at  dis¬ 
crete  instants  of  symbol  transmission.  For  adaptation  we 
changed  the  number  of  encoded  levels  according  to  CSI 
with  modulation  format  held  fixed.  The  transmitter  de¬ 
cides  what  overall  rate  should  be  transmitted  according 
to  a  set  of  thresholds  chosen  to  keep  the  BER.  below  a 
certain  level.  All  the  transmitters  first  look  at  the  im¬ 
mediate  values  of  fading  multiplicative  distortions.  For  a 
three  level  8PSK  if  0  <  Max( <x\)  <  all  the  three  level 
coding  is  done.  The  overall  transmission  rate  is  low  in  this 
case  for  worst  channel  conditions.  If  ji\  <  Max(a\)  <  //-2, 
first  two  rows  are  encoded.  Otherwise  only  one  level  is 
encoded  with  much  higher  rate.  The  receiver  of  any  ar¬ 
bitrary  A;th  user  correlates  the  complex  signal  with  each 
of  the  possible  signal  points  of  the  partitioned  signal  con¬ 
stellation  after  despreading.  Using  the  channel  history, 
corresponding  decoding  scheme  is  chosen  and  the  first 
component  code  is  decoded.  From  all  such  precise  de¬ 
coded  information  of  all  users,  CCI  is  estimated  and  is 
subtracted  subsequently  from  the  delayed  version  of  the 
received  signal  to  have  more  accurate  estimate  of  the  re¬ 
ceived  signal.  This  process  is  carried  on  till  all  the  com¬ 
ponent  codes  are  decoded. 


III.  Result  and  Conclusion 
For  good  channel  condition  we  used  one  level  coding  using- 

rate  1/2,  M— 4  convolutional  code  with  dfree  —  7.  As 
the  channel  condition  deteriorates  we  use  2nd  and  3rd 
level  codes  which  are  rate  2/3  convolutional  code  with  M 
=  4  and  dfree.  =  4.  The  BER  and  throughput  graphs 
are  shown  in  Fig.  1  for  a  total  10  users.  Fig.  2  shows 
the  performance  of  in  terms  of  throughput.  We  found 
that  about  lOdB  gain  can  be  achieved  by  using  adaptive 
scheme  compared  to  fixed  rate  scheme  at  a  BER  of  10-3. 


Fig.  1  BER,  and  throughput  curve  of  the  proposed  scheme 
in  Rayleigh  fading 


Fig.  2  Throughput  comparison  with  fixed  rate  scheme 
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Abstract  —  A  linear  decorrelator  detector  is  con¬ 
sidered  for  use  in  a  quasi-synchronous  code-division 
multiple  access  (QS-CDMA)  system.  For  long  code 
lengths,  the  evaluation  of  bit-error  rate  can  be  com¬ 
putationally  expensive,  due  to  the  need  for  exhaustive 
search  to  determine  the  worst  case  relative  delays. 
An  upper  bound  on  the  error  rate  based  on  eigen¬ 
value  bounds  is  presented  for  the  linear  decorrelator 
detector,  and  which  can  be  computed  solely  in  terms 
of  the  maximum  cross-correlation  between  codes  and 
the  number  of  users. 

I.  Introduction 

A  QS-CDMA  communication  system  is  considered  in  which 
decorrelators  are  employed  for  multiuser  detection  at  the  base 
station.  In  contrast  to  the  decorrelator-based  receiver  in  [1], 
the  delays  are  assumed  unknown  a-priori,  although  confined  to 
a  subinterval  of  the  bit  duration  due  to  the  quasi-synchronous 
assumption.  The  worst-case  bit-error  rate  (BER)  of  the  decor¬ 
relator  detector  can  be  evaluated  [2]  by  exhaustive  search 
over  the  relative  times-of-arrival  (code  delays)  of  the  users. 
However,  for  long  code  lengths,  such  an  approach  may  be 
extremely  time-consuming,  due  to  the  computational  burden 
of  evaluating  repeated  correlations.  Thus,  we  seek  an  upper 
bound  on  BER  that  can  be  used  for  long  PN  codes,  and  that 
does  not  require  determination  of  the  worst-case  code  delays. 

The  signal  model  for  the  QS-CDMA  system  is  first  de¬ 
scribed.  Let  sn(t)  represent  the  direct-sequence  signal  trans¬ 
mitted  by  the  ra-th  user.  The  received  Nyquist  samples,  where 
the  sampling  interval  is  Tc  sec.,  are  given  by 

N 

r{kTc)  =  Y,  VnSn(kTc  -  Tn)  +  n{kTc),  (1) 

71=1 

where  an  6  C  is  the  complex  amplitude  associated  with  the  ra¬ 
th  user,  and  Tn  is  the  ra-th  delay.  The  additive  noise  sequence 
n(kTc)  €  C  is  discrete-time  white  Gaussian.  Due  to  the  quasi- 
synchronous  assumption,  Tn  €  [~MTC,  MTC],  where  MTC  « 
T,  with  T  the  bit  duration.  It  will  be  convenient  to  work  with 
the  following  vector  model  of  the  received  signal  during  the 
fc-th  bit  duration. 


N 

r(k)  -  a\di(k)si(Ti )  -f  ^  ansn(Tn)  +  n(fc),  (2) 

n- 2 

where  the  elements  of  s„(Tn)  6  CL  are  the  Nyquist  samples 
Sn(kTc  —  Tn ). 


1This  work  was  sponsored  in  part  by  Rockwell  International  Co. 
and  the  UC  MICRO  program. 


II.  Description  of  the  Decorrelator  Detector 
and  Upper  Bound 

An  approximate  maximum-likelihood  receiver  for  the  signal 
model  (2)  has  been  previously  derived  [2],  and  is  described 
by  the  following  decision  variable,  where  it  is  assumed  that 
Si(Ti)  is  the  desired  signal. 

U  =  Re{r(kf[l  -  Ps,]s1(T1)e,ar9ai},  (3) 

where  [I  —  P5/]  is  an  orthogonal  projection  matrix  which  re¬ 
jects  the  undesired  users.  The  projection  matrix  P5/  corre¬ 
sponds  to  the  subspace  spanned  by  the  signals  sn(raTc).  As 
shown  in  [2],  undesired  vectors  with  delays  T„  falling  between 
the  discretized  values  mTc  are  nearly  rejected,  since  they  fall 
approximately  in  the  subspace  spanned  by  the  columns  of  . 

A  bound  is  obtained  for  an  SNR  loss  factor,  defined  in 
terms  of  SNR  =  E{U} /y/2Var{U}.  Then  the  loss  factor  is 
given  by  LF  —  SN R/ yj Eb/ No.  Hence,  if  no  loss  in  SNR  when 
compared  with  ideal  BPSK  occurs,  LF  =  1.  A  lower  bound  on 
the  SNR  is  found  in  terms  of  the  following  quantities  71  and  72 
derived  in  [3],  where  tmax  denotes  the  maximum  normalized 
cross-correlation  between  codes. 

~»«*(2M  +  1)(IV-1) 


71  =  1  - 


1  —  (2M  +  1)(N  —  1  )tmax' 

4a*(2M  +  l)(jV-l) 

r  1  —  ((2Af  +  1)(JV  —  1)  —  l)t„ 

The  term  £max  is  given  by 


72  —  tmax  T  E77 


(4) 


(5) 


L  —  M—m—l 


Ei  max  — 


sup 


—  —  k  —  M  —  m+ 1  n  =  —  00 


£  £  sinc(k  +  nL  —  e/Tc). 

(6) 


The  final  expression  for  the  loss  factor  is  then 


s  VtT 


(7) 


Note  that  when  the  actual  delays  Tn  equal  the  dicretized  val¬ 
ues  mTc ,  the  term  72  =  0.  Specfic  results  for  the  loss  factor 
are  evaluated  for  varying  Gold  code  lengths  and  SNRs  in  [3]. 
In  general,  the  bound  is  useful  for  long  PN  codes,  where  ex¬ 
haustive  search  to  find  the  worst-case  relative  delays  is  com¬ 
putationally  prohibitive. 
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Abstract  —  A  non-orthogonal  synchronous  direct 
sequence  code  division  multiple  access  (DS-CDMA) 
system  with  additive  white  Gaussian  noise  channel 
(AWGN)  is  presented  where  the  suboptimal  succes¬ 
sive  cancellation  detector  performs  optimal. 

I.  Introduction 

Due  to  recent  advances  in  cellular  technology  [1],  DS-CDMA 
has  been  considered  as  multiple  access  method.  It  is  well- 
known  that  joint  detection  of  the  users  improves  system  ca¬ 
pacity  considerably  [2].  The  maximum  likelihood  (ML)  deci¬ 
sion  rule  over  all  active  users  is  optimal  in  the  sense  of  the 
estimation  error  rate,  but  in  general  too  complex  due  to  the 
exponential  dependence  on  the  number  of  users.  The  com- 
pexity  (in  terms  of  operations  per  bit  decision)  of  successive 
cancellation  is  linear  in  the  number  of  users. 

II.  System  Description 

Fig.  1  shows  the  considered  synchronous  DS-CDMA  system 
modell  with  AWGN-channel.  On  account  of  the  synchronism 
and  the  memory  less  channel,  each  bit  period  can  be  consid¬ 
ered  independently.  All  vectors  denote  column  vectors.  The 
users  1 ...  K  transmit  bits  b\,  &2, . . . ,  bK  €  {—1, 1}  by  modu¬ 
lating  them  onto  user-specific  spreading- vectors  ci ,  C2, . . . , c k 
of  length  N.  The  components  of  the  spreading  vectors  are 
considered  to  be  real  numbers  and  the  Euclidean  norm  of  the 
vectors  is  equal  for  all  vectors 

|Cfc|2  =  CfcCjfe  =  1.  (1) 

The  superscript  ()T  denotes  the  transpose.  The  noise  vector 
n  is  assumed  to  be  Gaussian  with  covariance  matrix  (I  is 
the  identity  matrix).  With  b  the  transmitted  bit  vector  and 
C  the  spreading  code  matrix,  the  received  vector  r  can  be 
written  as 

r  =  Cb  +  n,  bA(&x  b2---bK)T,  cA  (Cl|c2|  •  •  •  |c*r)  .  (2) 

The  receiver’s  task  is  to  estimate  the  bits  &i, ...  ,  from 
the  observation  of  r.  The  optimal  decision  rule  for  equiproba- 
ble  input  bits  is  the  ML  rule,  which  minimizes  the  Euclidean 
distance  between  r  and  Cb  where  b  is  the  estimate  of  b. 

Definition  1  ML  rule:  Choose  the  estimated  bit  vector  hML 
such  that  the  Euclidean  distance  e  ist  minimized  with 

e2  A  |cbMi  -  r|2  =  (CbML  -  r)T{CbML  -  r).  (3) 

A  suboptimal  decision  rule  called  ”  successive  interference 
cancellation”  (SC  rule)  uses  first  a  bank  of  matched  filters 
MF1 . .  .MF K  to  produce  decision  variable  cb,  cfe, . . . ,  die  (see 
Fig.  1).  The  decision  variable  vector  d  can  be  written  as 

dA(di  d2  ■  ■  ■  dK)T  =  CTr  =  CTCb+CTn  =  Rb+CTn.  (4) 


Figure  1:  System  modell  (DS-CDMA,  AWGN-channel) 


Definition  2  SC  rule:  Let  the  reliability  of  a  decision  vari¬ 
able  rel (dk)  be  defined  by  the  absolute  value:  rel(cb)  =  |cb|- 
Let  =  r  and  Ss  =  {&i,  &2> . .  • ,  where  Ss  is  the  set 

of  indices  for  which  a  decision  has  been  taken  in  steps  1  up 
to  step  5  —  1.  Initially  Si  =  {}■  Then  successive  cancellation 
chooses  the  estimates  in  K  steps  as  follows: 

At  step  s  compute  d  from  and  choose  the  decision  vari¬ 
able  dk  with  highest  reliability  taking  into  consideration  only 
indices  k  £  Ss-  Decide  on  bit  bf c  using  the  sign  function 

t>kC  =  sgn  {dk),  (5) 

form  the  set  Ss+ 1  =  Ss  U  {fc}  and  compute  r^+1^  as 

r(s+i)  =  rw  _  hscCk  (6) 


III.  Result 

Theorem:  SC  and  ML  rules  are  equivalent  if  the  cross¬ 
correlation  R  — CTC  satisfies  for  a  constant  q  with  \q\  <  1 


Ritj  = 


Q  iz£  3 
1  else 


(7) 


Proof.  The  proof  has  two  steps,  of  which  the  first  shows  that 
for  the  first  decision  in  successive  cancellation  the  resulting  bit 
estimate  is  equal  to  the  maximum  likelihood  estimate.  The 
second  step  shows  that  after  subtracting  the  influence  of  the 
estimated  bit  the  problem  is  principally  the  same,  only  the 
dimension  has  decreased  by  1. 
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Abstract  —  Let  (A,  Y)  be  a  pair  of  discrete  random 
variables  with  A  taking  values  from  a  finite  set.  Sup¬ 
pose  the  value  of  X  is  to  be  determined,  given  the 
value  of  Y ,  by  asking  questions  of  the  form  Ts  X  equal 
to  x?’  until  the  answer  is  6 Yes.’  Let  G(x\y)  denote 
the  number  of  guesses  in  any  such  guessing  scheme 
when  X  =  x,  Y  —  y.  The  main  result  is  a  tight  lower 
bound  on  nonnegative  moments  of  G(X\Y).  As  an 
application,  lower  bounds  are  given  on  the  moments 
of  computation  in  sequential  decoding.  In  particu¬ 
lar,  a  simple  derivation  of  the  cutoff  rate  bound  for 
single-user  channels  is  obtained,  and  the  previously 
unknown  cutoff  rate  region  of  multi-access  channels 
is  determined. 

I.  The  Inequality 

Theorem  1  For  arbitrary  guessing  functions  G(X)  and 
G(X\Y),  and  any  p  >  0, 

E[G(X)P]  >  (1  +  In  Px(*)^]1+p  (1) 

x^X 

and 

E[G(X\Yf]  >  (l+lnM)-pY^lYPx^x’y^1+P  (2) 

yey 

where  Px,Y,  Px  are  the  probability  distributions  of(X ,  Y)  and 
X ,  respectively,  the  summations  are  over  all  possible  values  of 
X ,  Y,  and  M  is  the  number  of  possible  values  of  X . 

This  result  is  a  simple  consequence  of  the  following  variant 
of  Holder’s  inequality. 

Lemma  1  Let  at,  pi  be  nonnegative  numbers  indexed  over  a 
finite  set  1  <  i  <  M .  For  any  0  <  A  <  1, 


Proof.  Put  Ai  =  a,  A,  Bi  ~  a$p^,  in  Holder’s  inequality 

Proof  of  Theorem.  Inequality  (1)  is  obtained  by  taking  ai  = 
ip ,  Vi  =  Pr(<7(A)  =  i),  A  =  1/(1-}-/?)  in  the  lemma,  and  noting 
that  1  ji  <  (1  -f  In  M).  Inequality  (2)  follows  readily: 

E[G(X\Yy]  =  ^2Py(y)E[G(X\Y  =  y)p ] 
y 

y  x 

=  (1  +  In  M)~p  JJY  Px’Y(x>  y)d?]1+<> 

y  x 


II.  Application  to  Sequential  Decoding 

To  relate  sequential  decoding  to  guessing,  let  X  denote  the 
set  of  nodes  in  a  tree  code  at  some  level  N  channel  symbols 
into  the  tree  from  the  tree  origin.  Let  A  be  a  random  vari¬ 
able  uniformly  distributed  on  X ,  indicating  the  node  in  X 
which  lies  on  the  transmitted  path.  Let  Y  denote  the  received 
channel  output  sequence  when  A  is  transmitted.  Let  G(x\y) 
denote  the  rank  order  in  which  node  x  £  X  is  hypothesized 
(for  the  first  time)  by  a  sequential  decoder  when  A  =  x  and 
Y  =  y.  Moments  of  G(A|Y)  serve  as  measures  of  complexity 
for  sequential  decoding. 

Let  M  be  the  size  of  X ,  and  R  =  (1/A)  In  M  denote  the 
code  rate.  By  Theorem  1  and  the  fact  that  Px(x)  —  1/M  for 
x  €  X,  for  p  >  0, 

E[G(X\Y)P]  >  (1  +  NR)-pexp[pNR~  EQ(p,Px)] 
where 

E0(p,Px)  =  -  In  YlA  Px{x)PY\x{y\x)r^]1+p. 

y  x 

Gallager  [l,  p.  149]  shows  that  for  discrete  memoryless  chan¬ 
nels 

Eo(p,Px)<  NEo(p) 

where  Eo(p)  equals  the  maximum  of  Eo(p,  Q)  over  all  single¬ 
letter  distributions  Q  on  the  channel  input  alphabet.  Thus,  at 
rates  R  >  Eo(p)/p,  the  pth  moment  of  computation  performed 
at  level  N  of  the  tree  code  must  go  to  infinity  exponentially 
as  A  is  increased.  The  infimum  of  all  real  numbers  R*  such 
that,  at  rates  R  >  R* ,  A[G(A|Y)/3]  must  go  to  infinity  as  A 
is  increased  is  called  the  cutoff  rate  (for  the  pth  moment)  and 
denoted  by  Rcutof  /(p)-  We  have  thus  obtained  the  following 
bound. 

Theorem  2  For  any  discrete  memoryless  channel  with  a  fi¬ 
nite  input  alphabet, 

Rcutof  j(p)  <  Eo(p)/p ,  p  >  0.  (3) 

This  result  was  proved  earlier  (for  p  =  1  only)  in  [2];  the 
present  proof  is  much  simpler.  Moreover,  the  above  method 
extends  to  the  case  of  multiaccess  channels,  yielding  their  pre¬ 
viously  unknown  cutoff  rate  region  [3]. 
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Abstract  —  Coding  methods  for  piecewise  memo¬ 
ryless  sources  have  recently  been  studied  by  NIerhav 
[2].  While  Merhav  mainly  concentrated  on  the  single¬ 
transition  case,  we  describe  and  analyse  here  coding 
techniques  that  allow  multiple  transitions. 

I.  Introduction 

A  binary  memoryless  source  generates  the  sequence  x\  •  ■  •  xt 
with  probability 

Pa(x  1  *  •  •  Xt)  =  Ut  =  l,TPa(xt)- 

The  source  parameter  is  piecewise  constant.  Suppose  that  the 
instants  before  which  transitions  appear  are  ti,  t2,  * '  ‘  j teh  i-e- 


Pa(Xt  =  1)  =0c,  if  tc<t<tc+ 1, 

with  c  -  0, 1,  •  •  •  ,  C,  and  t0  =  1  and  tc+ 1  =  T  +  1.  We 
describe  a  method  for  compressing  the  sequences  generated 
by  this  source  based  on  arithmetic  coding  techniques  (see  [1]). 
It  was  our  objective  to  construct  a  coding  distribution  Pc(') 
such  that  the  maximal  individual  redundancy 

,  Pa(xi"  ’XT) 

max  log  - —j- - - - -T  ■ 

xl  ’•’Xrp  Pc\X\  ■  ‘  ‘  XT  ) 

is  as  small  as  possible.  The  base  of  the  log  is  2.  Information 
quantities  are  expressed  in  bits. 


The  source  output  xt  and  next  state  (t  +  l,p,<?),  where  (p,  q)  = 
(c,  tc)  or  (c  +  1,  t  +  1)  are  assumed  to  be  independent  of  each 
other  given  the  current  state  (i,c,  tc). 

Under  these  assumptions  the  coding  probability  for  se¬ 
quence  X\  ■  ■  •  Xt  is  the  probability  that  the  source  moves  along 
a  certain  path  to  a  state  at  depth  t  and  generates  x\  ••■££, 
summed  over  all  paths  and  states. 

Two  remarks  can  be  made  about  this  coding  distribution. 
The  first  is  that  the  coding  distribution  can  easily  be  up¬ 
dated  sequentially  using  the  graph  structure.  The  second  re¬ 
mark  is  that  the  coding  distribution  can  be  regarded  as  a 
weighting  (see  [3])  over  all  coding  distributions  corresponding 
to  fixed  transition  patterns.  The  weighting  coefficient  of  a 
pattern  is  determined  by  the  Krichevsky-Trofimov  estimator. 
This  makes  it  easy  to  study  the  redundancy  behavior  of  our 
method. 


III.  Performance  analysis 

If  the  actual  source  made  C  transitions  (pattern  T),  our 
method  yields 


log 


Pa{x i  •  •  ■  xt) 
Pc(x i  •  *  ■  xt) 


^  i  Pa{x  1  ‘  ’  '  XT  ) 

"  SPc(xi---*r|T) 


+  log 


1 

Ptr(T) 


< 


C  +  l 


2 

+Clog 


log 


C  +  l 
(r-i)e 

c 


+  C  +  1 

+  |log(T-l)  +  l. 


II.  Coding  method 

We  assume  that  the  source  moves  in  a  graph  (see  below)  from 
state  to  state.  It  starts  in  state  (1,0,1)  and  generates  the 


symbol  x\  according  to  parameter  0o-  When  the  source  is  in 
state  (i,  c,  tc)  it  first  generates  symbol  xt  according  to  param¬ 
eter  6C.  After  this,  its  parameter  may  change  to  0c+i,  in  that 
case  the  source  moves  to  state  (t  +  1,  c  +  1,  tc+ 1  =  t  +  1),  or  it 
may  not  change,  then  the  source  moves  to  state  (t  +  1,  c,  tc). 
When  the  source  is  in  state  (t,tc)  we  assume  that 


Pc(Xt  =  l|(t,c,tc)) 


b(xtc  •  •  •  xt- i)  +  1/2 
t  —  tc  +  1 


where  b(xtc  *  *  ■  a?£_i)  is  the  number  of  ones  in  Xtr  *  •  •  ast-i,  and 


Ptr((*  +  l,c+M  +  l)|(*,c,  tc)) 


c  +  1/2 
t 


Parameter  redundancy  is  as  usual  (i.e.  roughly  ^logT  per 
parameter),  the  transition  redundancy  is  roughly  logT  per 
transition,  plus  a  bias  term  of  ^  log  T.  Apart  from  this  bias 
term  our  method  achieves  the  Merhav  [2]  lower  bound.  The 
bias  term  however  is  a  consequence  of  the  fact  that  the  number 
of  transitions  is  assumed  to  be  unknown. 

IV.  Complexity 

The  storage  complexity  of  this  method  grows  quadratically  in 
the  sequence  length  T.  The  computational  complexity  is  cubic 
in  T . 

There  exists  a  simplified  version  of  the  method  described 
here  which  has  storage  behavior  linear  in  T  and  for  which  the 
number  of  computations  is  quadratical  in  T.  For  this  method 
the  redundancy  is  roughly  §  logT  per  transition  however. 
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Abstract  —  We  briefly  describe  the  results  in  [4], 

I.  Introduction  and  the  main  result 

Let  us  consider  an  information  source  S  with  probability 
distribution  P  =  {pG)}£o  on  the  infinite  source  alphabet 
X  —  {0, 1,2... where  we  assume  throughout  the  positivity: 
p(i)  >  0  for  all  i  £  X  as  well  as  the  monotonicity: 

min  p(i)  >  p(k0)  >  p(k0  +  1)  >  p(k0  +  2)  >  •  •  •  (1) 

^  \  I  N  KQ  —  1 

for  some  ko  £  X. 

We  shall  consider  prefix  codings  for  the  distribution  P  us¬ 
ing  the  D-ary  code  alphabet  V  =  {0, 1, . . . ,  D  —  1}  where  D  is 
an  integer  such  that  D  >  2.  Our  concern  is  how  to  construct 
an  optimal  D-ary  prefix  code  given  a  probability  distribution 
P  on  X.  where  “optimal”  means  a  code  with  the  minimum 
expected  codeword  length  over  all  the  possible  prefix  codes 
for  the  P .  Although  the  Huffman  coding  algorithm  with  finite 
source  alphabet  is  known  to  achieve  the  optimal  code,  it  is  not 
applicable  in  general  to  the  case  with  infinite  source  alphabet. 
However,  some  specific  properties  of  the  given  distribution  P 
are  enough  to  ensure  the  applicability  of  Huffman-type  cod¬ 
ing  algorithms  as  well  also  to  the  infinite  alphabet  case:  for 
instance,  see  Gallager  and  Van  Voorhis[2],  Humblet[3],  and 
Abrahamsjl].  The  sufficient  conditions  shown  in  these  papers 
are  all  written  in  terms  of  inequalities  that  must  hold  for  all 
m  £  X  larger  than  some  integer. 

In  [4]  we  provide  a  new  type  of  sufficient  condition  that 
merely  includes  inequalities  for  infinitely  many  m’s  in  X, 
which  is  stated  as 

Condition  1.  There  exist  infinitely  many  m’s  in  X  (m  >  k 0) 
such  that 

m  =  —1  (mod  D  —  1)  (2) 

and 

oo 

p(m)  >  p^-  (3) 

Remark  1.  It  is  evident  that,  in  the  binary  case  (D  =  2), 
condition  (2)  holds  for  all  m  6  X . 


j  -  1,  2,  ■  ♦  let  us  define  a  partition  of  X  by  A)  =  {A^ 
where 

^(k)  _  (  {k}  for  0  <  k  <  mj , 

3  |  {i  |  i  >  &}  for  k  —  mj  +  1. 


For  each  j  —  1,  2,  •  •  ■  define  the  information  source  S3  wfith 
the  finite  alphabet 


“  +  1},  {mj- 1  +  2},  •  • 


I  i  >  mj  +  1}} 


and  the  probability  distribution  P3  —  {p3  (A^)}™^*1  tl 
such  that 


ft(A^})  -  —  J  (mj- 1  +  1  <  k  <  m3  +  1), 


where  p(B)  —  YlieB  I°r  a  subset  B  C  X  and  we  have  set 


mo  =  -1  (A°  =  {^})  and  a3  ^  p(Af). 

Now  we  are  in  the  place  to  describe  the  coding  algorithm 
to  construct  the  code  CH. 


Coding  algorithm 

Step  0  Set  j  :=  1  and  C(*)  :=  A  (null  string). 

Step  1  Construct  a  D-ary  Huffman  code  Cf  for  the  infor¬ 
mation  source  S3 . 

Step  2  If  {z}  £  A3  then  define  the  codeword  for  i  £  X  by 

C%)  ■=  C(*)  ■ 

where  denotes  the  concatenation  of  strings. 

Step  3  Set  C(*)  :=  C(*)  ■  Cf(AiJni+1)). 

Step  4  Set  j  :=  j  +  1  and  go  to  Step  1. 

Remark  2.  For  each  j  >  1,  just  after  Step  3  the  resulting 
code  (the  codewords  are  Cn(i)  for  i  (0  <  i  <  m3 ),  and  C(*)  for 
i  =  mj  + 1  )  is  a  Huffman  code  for  the  source  S^o  (with  finite 

3 

alphabet  AJ0  and  probability  distribution  PA o  =  {p(A^)}). 
Generalizing  this  property,  we  get  a  new  definition  of  an  op¬ 
timal  D-ary  prefix  code  which  is  meaningful  even  for  the  case 
where  H(P)  =  oo.  See  [4]  for  details. 


Our  main  result  is 

Theorem  1.  If  the  probability  distribution  P  on  X  satisfies 
Condition  1?  then  there  exists  an  algorithm  to  recursively  con¬ 
struct  an  optimal  D-ary  prefix  code  CH. 

II.  Coding  algorithm  under  Condition  1 

To  describe  how  to  construct  the  code  Cu,  let  us  first  intro¬ 
duce  some  notations.  Let  mi  <  m2  <  •  •  ■  be  those  integers  m 
that  satisfy  conditions  (2)  and  (3)  where  mi  >  ko.  For  each 
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Abstract  —  “In-place”  Huffman  coding  of  a  file  can 
cause  the  file  to  temporarily  expand.  In  this  paper 
we  investigate  this  phenomenon, 

I.  Introduction 

Huffman  codes  are  widely  used  for  data  compression.  In 
a  typical  application,  a  file  consisting  of  N  m-bit  symbols  is 
compressed  by  an  adaptive  version  of  Huffman’s  algorithm, 
in  which  the  required  symbol  probabilities  are  determined  by 
the  relative  frequencies  of  the  symbols  in  the  file.  Each  m- 
bit  symbol  in  the  file  is  then  replaced  by  the  corresponding 
Huffman  codeword.  It  is  clear  that  if  such  a  strategy  is  used, 
the  file  cannot  expand,  since  the  “worst  case”  is  when  all  the 
Huffman  codewords  have  length  vn  bits,  and  in  all  other  cases 
the  file  will  indeed  be  strictly  compressed. 

However,  if  the  compression  is  done  sequentially  and  “in 
place,”  that  is,  if  the  first  symbol  in  the  file  is  encoded,  then 
the  second,  etc.,  the  file  may  temporarily  expand  if  many  low- 
probability  symbols  occur  at  or  near  the  beginning  of  the  file. 
In  space-critical  implementations  of  Huffman’s  algorithm,  it 
will  then  be  important  to  know  the  amount  of  extra  storage 
space  that  must  be  allocated  to  allow  for  this  temporary  ex¬ 
pansion. 

The  general  problem  we  address  in  this  paper,  then,  is  this. 
For  a  file  consisting  of  N  letters  from  a  source  alphabet  of  2m 
symbols,  what  is  the  maximum  possible  “temporary  expan¬ 
sion”  possible  for  a  Huffman  code,  in  units  of  bits  per  file 
symbol?  We  denote  this  quantity  by  <5(m). 

II.  Example 

Consider  an  8-letter  symbol  alphabet  {A,  B,  C,  D,  E,  F,  G,  H}, 
and  a  file  consisting  of  the  following  16  symbols  from  the  al¬ 
phabet. 

HGFEDCBBBBAAAAAA. 

If  each  symbol  in  the  alphabet  is  given  a  3-bit  representation, 
the  file  length  is  48  bits.  The  relative  frequencies,  and  a  set 
of  appropriate  Huffman  codewords,  for  this  file  is  given  in  the 
following  table. 


symbol 

rel.  freq. 

codeword 

length 

A 

3/8 

0 

1 

B 

1/4 

10 

2 

C 

1/16 

1100 

4 

D 

1/16 

1101 

4 

E 

1/16 

11100 

5 

F 

1/16 

11101 

5 

G 

1/16 

11110 

5 

H 

1/16 

11111 

5 

After  each  of  the  16  symbols  in  the  file  has  been  replaced  by 
its  Huffman  codeword,  a  simple  calculation  shows  that  the 

1Work  done  at  JPL  under  contract  with  the  National  Aeronau¬ 
tics  and  Space  Administration. 

2 Work  done  with  partial  support  from  the  National  Science 
Foundation 


fully  compressed  file  is  only  42  bits  long.  However,  since  the 
low-frequency  symbols  C,  D,  E,  F,  G,  H  all  occur  at  the  be¬ 
ginning  of  the  file,  after  these  six  symbols  have  been  Huffman 
encoded,  the  file’s  length  will  be  58  bits,  which  represents  an 
expansion  of  5/8  bits  per  source  symbol.  In  fact,  it  follows 
from  our  results  that  for  an  eight-letter  source  alphabet  this 
is  the  worst  case,  so  that  (5(3)  =  5/8.  (It  is  easy  to  see  that 
6(1)  =  0  and  (5(2)  =  2/5.) 


III.  Alternative  definition  of  8(rn) 

There  is  a  alternative  definition  of  <5(m)  that  makes  the 

problem  easier  to  deal  with. 

Definition.  For  m  =  1,2,3,...,  define 


6(m)  =  max  <  Pj(nj  —  m)-\ 

l  j=i 


where  the  maximum  is  taken  over  all  pairs  of  lists  (p i  >  P2  > 
. . .  >  p2m),  (rn  <  n2  <  ■  ■  *  <  n2m),  where  pj ’s  are  an  ordered 
list  of  probabilities ,  and  the  nj ’s  are  the  lengths  of  a  corre¬ 
sponding  Huffman  code  for  the  pj’s.  (The  symbol  “(x)+”  IS 
shorthand  for  max(x,0).) 


IV.  Statement  of  Results 

Theorem  1.  There  is  a  universal  constant  A  such  that 
6(m)  <  A  for  all  m. 

The  proof  of  Theorem  1  relies  on  a  recent  result  of  Schack 
[3].  We  conjecture  that  A  =  4/5  (incidentally,  for  “Shannon” 
codes  it  is  quite  easy  to  show  that  the  corresponding  quantity 
is  Ashannon  =  1),  but  so  far  we  have  only  been  able  to  prove 
that  4/5  <  A  <  4.  The  upper  bound  comes  from  a  careful 
examination  of  the  proof  of  Theorem  1.  The  lower  bound 
comes  from  an  explicit  construction  of  a  family  of  Huffman 
trees,  using  techniques  similar  to  those  developed  in  [1]  and 
[2],  for  which  the  quantity  <5  equals  (4  •  2m  —  12)/(5  ♦  2m  —  8), 
for  m  >  3.  The  probabilities  in  the  tree  are  proportional 
to  (2m  -  2,2m“\l,...,l),  and  the  corresponding  Huffman 
lengths  are(l,2,m+I,m+lJm  +  2,...,m  +  2).  Furthermore, 
we  believe  that  this  construction  gives  the  largest  possible 
value  of  <5,  so  we  have  the  following  conjecture. 

Conjecture  1.  6(m)  T  4/5,  as  m  — ►  00. 
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I.  INTRODUCTION 

Vam  [8]  introduced  the  problem  of  finding  the  prefix  condition  variable 
length  source  code  which  minimizes  average  cost  when  the  code  symbols  are 
of  unequal  cost  and  the  source  symbols  are  equiprobable.  Other  authors  have 
also  addressed  this  problem  ffom  the  algorithmic  point  of  view  [3,4,7],  There 
are  two  versions  of  this  problem,  exhaustive  in  which  the  r-ary  code  tree  is 
constrained  to  be  a  full  tree  and  nonexhaustive  in  which  that  constraint  is  not 
imposed.  Recently  the  author  [1]  was  able  to  show,  based  on  previous  work 
by  Horibe  [5]  and  Chang  [2],  for  the  exhaustive  case  that  for  integer  code 
symbol  costs  there  exists  a  very  close  relationship  between  a  subsequence  of 
the  sequence  of  Varn  code  trees  indexed  by  the  number  of  leaves  in  the  tree 
and  a  recursively  generated  sequence  of  trees  called  generalized  Fibonacci 
trees.  In  particular  the  kth  tree  in  the  recursively  generated  sequence  of  trees 
has  as  its  ith  leftmost  subtree,  i=l,...,r,  the  tree  previously  generated  in  the 
sequence  with  index  k~c(i)  where  the  code  symbol  costs  are  c(i)  ordered 
monotonically  nondecreasing  in  i  and  the  associated  code  symbols  are 
associated  with  the  code  tree  branches  from  left  to  right.  The  initialization  is 
that  the  first  c(r)  trees  are  all  single  root  nodes.  Then,  when  the  number  of 
leaves  in  the  exhaustive  Varn  code  tree  is  the  same  as  the  number  of  leaves 
in  the  generalized  Fibonacci  tree  for  the  same  code  symbol  costs,  they  are  the 
same  tree.  The  recursive  construction  is  nice  in  that  it  reveals  an  elegant 
structure  underlying  the  sequence  of  Varn  code  trees  and  also  because 
recurrence  relations  derived  from  the  recursive  construction  permit  the 
evaluation  of  the  resulting  minimum  average  cost  codes  without  actually 
constructing  the  tree. 

The  problem  addressed  in  this  abstract  is  to  identify  a  similar  recursive 
construction  for  the  nonexhaustive  case.  It  turns  out  that  it  is  possible  to  do 
this  not  for  Vam's  original  problem,  but  for  a  close  variant  of  it.  While  Varn 
looks  for  optimum  codes  in  the  minimum  average  codeword  cost  sense,  the 
problem  of  interest  here  will  be  to  look  for  optimum  codes  in  the  sense  of 
minimizing  the  maximum  codeword  cost.  It  is  not  hard  to  see  that  in  the 
exhaustive  case,  Varn's  algorithm  finds  optimum  code  trees  in  both  senses, 
that  is,  the  minimum  average  codeword  cost  tree  is  also  the  minimax  tree.  But 
this  is  not  the  case  for  nonexhaustive  codes.  Perl  et  al.  [7]  give  a  simple 
algorithm  for  the  minimax  problem  as  a  "remark"  in  their  paper  otherwise 
concerned  with  the  minimum  average  codeword  cost  case.  So,  as  we’ll  see,  it 
is  the  minimax  version  of  Varn’s  problem  which  has  the  Fibonacci-like 
structure.  It  will  also  turn  out  that  under  certain  conditions  on  the  code 
symbol  costs,  minimax  and  minimum  average  codeword  cost  trees  are 
identical.  One  nonexhaustive  special  case  for  which  this  is  true,  c(i)=i, 
i=l,2,...,  was  treated  previously  in  the  literature  by  Patt  [6]  motivated  by  a 
computer  file  search  problem. 

II.  CONSTRUCTING  NONEXHAUSTIVE  TREES 
RECURSIVELY 

As  in  the  exhaustive  case,  the  kth  tree  in  our  sequence,  T(k),  will  have 
T(k-c(i))  as  its  ith  leftmost  subtree.  However  now  the  initialization  will  be 
T(l)~  ..=T(c(2))  each  consisting  of  a  single  root  node.  One  example  is  given 
in  this  abstract.  The  costs  are  c(l)=2,  c(2)=c(3)=3,  c(4)=5.  The  trees  are 
described  by  labeling  leaf  nodes  with  their  costs,  listing  them  in  left  to  right 
order  with  sibling  nodes  separated  by  +  signs,  and  using  parentheses  to 
indicate  depth  in  the  tree  from  the  root  T(l)=T(2)=T(3)=0,T(4)=(2+3+3), 
T(5)=(2+3+3),T(6)=((4+5+5)+3+3+5),T(7)=((4+5+5)+(5+6+6)+(5+6+6)+ 
5),....  It  is  not  hard  to  give  recurrence  expressions  for  the  number  of  leaves  in 
the  kth  tree  and  for  its  unnormalized  cost.  These  can  be  solved  by  the  method 
of  generating  functions. 

III.  PROOF  OF  MINIMAX  OPTIMALITY 

The  idea  of  the  proof  that  the  trees  constructed  in  the  previous  section  are 
Vam  minimax  trees  is  outlined  here.  First  it  is  shown  that  the  nonexhaustive 


generalized  Fibonacci  trees  are  the  same  as  the  exhaustive  generalized 
Fibonacci  trees  with  a  certain  number  of  highest  cost  leaves  removed.  This 
can  be  done  by  induction.  Then  we  make  use  of  an  argument,  like  Varn's  for 
the  minimum  average  cost  case,  that  optimal  minimax  nonexhaustive  code 
trees  are  obtained  by  deleting  highest  cost  leaves  from  a  particular  "correct" 
optimal  exhaustive  tree  while  maintaining  the  same  number  of  interior  nodes. 
The  hard  part  is  to  show  that  the  exhaustive  generalized  Fibonacci  tree  in  its 
sequence  beginning  with  T(c(r)+1)  is  the  "correct"  tree  for  the  corresponding 
nonexhaustive  generalized  Fibonacci  tree  in  its  sequence  beginning  with 
T(c(2)+1).  To  do  this  we  need  to  show  that  if  we  started  with  any  other 
optimal  exhaustive  tree  and  deleted  the  appropriate  number  of  highest  cost 
leaves,  we  would  either  get  a  more  costly  tree  in  the  minimax  sense  or  would 
have  to  remove  leaves  in  such  a  way  as  to  leave  what  was  an  interior  node 
childless.  The  details  of  this  demonstration  are  omitted  in  this  abstract  for 
conciseness. 

IV.  WHEN  ARE  MINIMAX  CODE  TREES  MINIMUM 
AVERAGE  COST? 

The  algorithm  of  Perl  et  al.  [7]  for  minimum  average  cost  trees  involves  two 
stages,  extension  and  mending,  and  their  algorithm  for  minimax  trees  is  a 
variant  of  the  extension  stage.  They  also  give  sufficient  conditions  on  the 
code  symbol  costs  for  the  mending  stage  to  be  unnecessary  in  the  minimum 
average  cost  problem.  Thus,  whenever  any  of  these  sufficient  conditions  is 
satisfied  by  the  costs,  and  the  costs  are  such  that  the  variant  of  the  extension 
algorithm  for  minimax  codes  and  the  original  extension  algorithm  for 
minimum  average  cost  codes  both  yield  the  same  tree,  minimax  and  minimum 
average  cost  code  trees  are  the  same  and  the  minimum  average  cost  tree 
sequence  shares  the  nice  recursive  structure  of  the  minimax  tree  sequence. 
Part's  [6]  costs  are  one  such  example,  and  his  paper  includes  the  recursive  tree 
sequence  structure. 

V.  CONCLUSION 

The  highly  structured  recursively  constructed  subsequence  of  optimal 
exhaustive  code  trees  has  been  extended  to  the  nonexhausive  case  by 
focusing  on  the  minimax  optimality  criterion  instead  of  Varn’s  original 
minimum  average  codeword  cost  criterion.  When  these  two  criteria  give  the 
same  sequence  of  code  trees,  as  they  do  under  certain  conditions  on 
the  costs,  omitted  here  for  conciseness,  the  recursive  structure 
applies  to  Varn's  original  problem  as  well. 
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Abstract  —  Given  a  programmable  finite-state  in¬ 
put/output  device,  what  program(s)  maximally  re¬ 
duces  the  “diversity”  of  the  possible  output  sequences 
of  the  device?  This  question  is  made  precise,  and 
a  method  is  developed  to  determine  this  minimum 
achievable  diversity. 

I.  Introduction 

In  this  paper,  a  (time-invariant)  finite-state  entropy- 
reduction  algorithm ,  or  briefly,  an  algorithm ,  is  a  synchronous 
finite-state  input/output,  device  (see  e.g.  [1]);  such  a  device 
takes  inputs  from  a  given  source  alphabet  B,  and,  depending 
on  the  input  symbol  and  its  internal  state,  produces  an  output 
symbol  in  an  output  alphabet  F  and  moves  to  a  new  internal 
state. 

In  the  context  of  channel  codes  (modulation  codes),  such 
a  device  is  usually  referred  to  as  a  (synchronous)  finite-state 
encoder  [1]  and  is  used  to  translate  or  encode  arbitrary  se¬ 
quences  of  source  symbols  into  sequences  that  have  certain 
desirable  properties.  In  that  context,  it  is  of  course  required 
that  decoding  is  possible. 

Here,  we  think  of  such  devices  as  performing  some  sort  of 
data  compression  on  sequences,  and  we  are  interested  in  al¬ 
gorithms  that  have  a  “small”  output  space.  A  natural  measure 
of  the  efficiency  of  an  algorithm  is  thus  the  topological  entropy 
of  the  output  space,  which  measures  the  growth  rate  of  the 
number  of  output  sequences  of  a  given  length.  (In  the  context 
of  channel  codes,  the  entropy  is  usually  called  the  (Shannon) 
capacity  [1].)  We  allow  the  case  where  the  input  sequences  are 
restricted  to  a  given  constrained  system  over  B. 

The  use  of  the  term  “data  compression”  may  cause  con¬ 
fusion,  since  we  do  not  consider  the  reconstruction  problem. 
Instead,  it  might  be  better  to  speak  of  entropy-reduction :  the 
algorithm  transforms  data  sequences  and  the  efficiency  of  the 
entropy-reduction  is  measured  by  the  number  of  distinct  out¬ 
put  sequences  that  can  be  produced  by  the  algorithm. 

Now  let  T  be  a  finite  collection  of  algorithms,  all  having  the 
same  source  alphabet  and  sharing  a  common  set  of  internal 
states.  A  time-varying  (entropy -reduction)  algorithm  over  T 
is  a  sequence 

f  =/l,/2, A,"' 

of  algorithms  ft  €  T.  We  think  of  such  a  sequence  f  as  an 
algorithm  whose  action  at  time  t  is  directed  by  algorithm  ft « 
So  at  time  t ,  t  =  1,2,...,  the  algorithm  f  takes  an  input 
from  the  source  alphabet  and,  depending  on  this  input  and 
its  internal  state,  produces  an  output  and  moves  to  another 
internal  state  according  to  algorithm  ft .  The  collection  of  all 
entropy-reduction  algorithms  over  (all  sequences  over  J-) 
will  be  denoted  by  JF°°. 

Interestingly,  it  may  happen  that  some  time- varying  al¬ 
gorithm  in  JF°°  performs  better  than  the  best  algorithm  in 
JF.  (This  will  be  shown  by  some  examples.)  So  the  question 
now  arises  how  to  produce  lower  bounds  for  the  efficiency  of 


algorithms  in  JF°°  and  how  to  find  the  best  time-varying  al¬ 
gorithm  in  JF°°.  We  will  refer  to  this  problem  as  the  optimal 
entropy-reduction  problem  for  T , 

The  motivation  to  investigate  time-varying  entropy- 
reduction  stems  from  a  problem  in  [2]  and  [3]  on  ordering  in 
sequence  spaces,  a  subject  introduced  in  [4]  to  study  certain 
types  of  organization  processes.  We  will  outline  these  order¬ 
ing  problems,  and  we  will  show  that  they  may  be  considered 
as  special  instances  of  the  optimal  entropy-reduction  problem 
considered  here. 

We  show  how  the  optimal  entropy-reduction  problem  can 
be  transformed  into  a  problem  for  a  related  finitely-generated 
semigroup  of  non-negative  matrices.  Briefly  stated,  we  will 
show  that  with  each  algorithm  /  in  T  we  may  associate  a  non¬ 
negative  matrix  Df  such  that  the  efficiency  of  an  algorithm 
f  =  /i,  /2,  •  •  •  in  T°°  is  measured  by  the  number 

p(f)  =  limsup  A (DflDf2  •  *  *  Dfi)x/t, 

t — *  oo 

where  A (D)  denotes  the  largest  real  (Perron-Frobenius)  eigen¬ 
value  of  a  non-negative  matrix  D.  The  number 

=  liminf  ju(f), 
feT°° 

which  can  be  thought  of  as  the  minimum  growth  rate  of  matrix 
products  in  the  semigroup  generated  by  the  matrices  Df,  f  6 
T,  then  provides  the  solution  to  the  optimal  entropy-reduction 
problem. 

We  then  investigate  this  semigroup  problem.  We  present 
a  method  to  obtain  lower  bounds  for  the  optimal  efficiency 
p,{fF ).  In  fact,  we  conjecture  that  in  many  cases  our  method 
will  be  able  to  determine  the  exact  value  of  //(JF).  Our  results 
generalise  (part  of)  Perron-Frobenius  theory  for  non-negative 
matrices  to  semigroups  of  such  matrices. 

Later,  we  return  to  the  ordering  problem.  We  will  use 
our  method  to  determine  T2 (0,2,1),  the  optimal  efficiency 
of  a  time-varying  binary  ordering  algorithm  in  the  class 
(0,  2, 1,  T+}  0~)  [4].  We  will  show  that  r2(0,  2, 1)  =  |2log(2  + 
V3),  as  conjectured  in  [2].  Our  approach  suggests  that  (at 
least  in  principle)  other  values  of  Tq(7T,j3 ,  <f> )  may  be  computed 
similarly. 

Finally,  we  discuss  our  results. 
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Abstract  —  For  Markov  sources,  we  consider  a  gen¬ 
eralization  of  variable-to-fixed  length  codes  and  find 
the  optimal  code  and  its  performance  as  the  dictio¬ 
nary  size  approaches  infinity. 

I.  Introduction 

A  variable-to-fixed  length  coder  can  be  decomposed  into  a 
parser  and  a  string  encoder.  The  parser  segments  the  source 
output  into  a  concatenation  of  variable- length  strings,  each 
of  which  belongs  to  a  dictionary  with  M  entries.  The  string 
encoder  maps  each  dictionary  entry  into  a  fixed-length  code¬ 
word.  Variable- to-fixed  length  codes  can  take  advantage  of 
the  memory  of  the  source  when  the  dictionary  entries  are 
roughly  equiprobable.  Furthermore,  the  Lempel-Ziv  codes  can 
be  viewed  as  universal  variable-to- fixed  length  codes. 

II.  Problem  Formulation 

A  Markov  source  with  finite  alphabet  {0, . . . ,  K  —  1}  and  set 
of  states  {0, . . . ,  R  —  1}  is  defined  by  specifying,  for  each  state 
s  and  letter  j, 

1.  the  probability  psj  that  the  source  emits  j  from  state  s 

2.  the  next  state  5[s,  j]  after  j  is  issued  from  state  s. 

Given  any  initial  state  so,  these  rules  inductively  specify  both 
the  probability  P(cr|so)  that  any  given  source  string  a  is  emit¬ 
ted  and  the  resulting  state  S^so,  o\  after  a  is  output;  they  also 
determine  7 Y,  the  entropy  of  the  source  in  natural  units,  H(s), 
the  entropy  in  natural  units  of  the  next  source  symbol  emitted 
from  state  s,  and  7rs,  the  long-run  proportion  of  time  that  the 
source  is  in  state  s. 

The  dictionaries  that  we  consider  are  uniquely  parsable ;  i.e., 
every  source  sequence  has  exactly  one  prefix  in  the  dictionary. 
This  condition  implies  that  M  =  a(K  —  1)  +  1  for  some  in¬ 
teger  a;  here,  a  is  the  number  of  intermediate  nodes  in  the 
dictionary  tree,  including  the  root. 

The  best  variable-to-fixed  length  code  has  a  dictionary  that 
maximizes  the  steady-state  expected  number,  E[L],  of  source 
letters  per  dictionary  entry.  For  the  special  case  of  a  discrete, 
memoryless  source,  E[L]  is  the  sum  of  the  probabilities  as¬ 
sociated  with  each  intermediate  node  in  the  tree,  including 
the  root.  For  more  general  Markov  sources,  it  is  consider¬ 
ably  harder  to  evaluate  E[L]  since  the  probability  of  a  dictio¬ 
nary  entry,  starting  at  a  parsing  point,  depends  on  the  state 
probabilities  at  parsing  points,  which  in  turn  depend  on  the 
dictionary  itself. 

To  gain  insights  into  codes  for  Markov  sources,  we  will  con¬ 
sider  a  broader  class  of  codes  in  which  there  is  a  uniquely 
parsable  dictionary  Vs  of  size  M  associated  with  each  state 
s.  For  these  codes,  the  parser  determines  the  source  state 
s  after  each  parsing  point  and  then  uses  Vs  to  find  the 
next  parsed  string.  We  would  like  to  find  a  good  way 
to  design  the  dictionaries  T>s.  Let  Cs  represent  the  ex¬ 
pected  length  of  a  dictionary  entry  for  Vs\  then,  for  all  s, 

intermediate  nodes  cr  for  T>»  P(a|s). 

In  [1],  Vs  was  chosen  to  maximize  Cs  for  each  state  s.  This 


code,  called  the  generalized  Tunstall  code ,  maximizes  the  ex¬ 
pected  number  of  source  symbols  per  parse  for  each  state,  but 
does  not  necessarily  lead  to  good  parsing  probabilities. 

III.  New  Contributions 

There  is  another  way  to  address  the  problem  of  selecting 
the  dictionaries  {Vs}.  Let  Hs  denote  the  entropy  of  the  en¬ 
tries  in  Vs  and  let  H  represent  the  steady-state  average  self¬ 
information  between  successive  parsing  points.  We  have 
Theorem  1  H  =  H  •  E[L). 

For  memoryless  sources,  this  “conservation  of  entropy”  the¬ 
orem  was  established  for  codes  with  one  uniquely  parsable 
dictionary  in  [2] ;  our  proof  indicates  that  it  applies  for  a  much 
larger  set  of  codes  than  the  ones  we  discuss  here.  Theorem  1 
suggests  that  we  may  get  good  dictionaries  by  maximizing  H3 
for  each  s.  The  “leaf  entropy”  theorem  of  [3]  implies  that 

Hs  =  ^  P{a\s)H{S[s,a]),  Q  <s<  R-V, 

intermediate  nodes  a  for  T>3 

the  expressions  for  Cs  and  Hs  suggest  that  we  should  consider 
dictionaries  that  maximize 

X(s)  =  E*  -P(o-|s)^S[»,<r] 

intermediate  nodes  o  for  T>3 

for  some  choice  of  x  =  {xo, . . . ,  X{  is  called  the  weight 

of  state  i  and  P(a|s)a;5[Sj<T]  is  the  state  s  return  of  string  a . 
A  desirable  feature  of  the  generalized  Tunstall  code  is  that  it 
can  be  constructed  in  a  greedy  manner;  i.e.,  for  each  state  s 
and  string  cr,  the  state  s  return  of  a  is  at  most  the  state  s 
return  of  any  proper  prefix  of  <r,  so  the  nodes  with  the  largest 
state  $  return  can  be  selected  one  by  one  starting  with  the 
null  string.  A  necessary  and  sufficient  condition  for  a  greedy 
construction  is  that  the  weight  vector  x  is  in  the  set 

Q  =  {x  =  (so, . . . ,  :  x  >  0,  PijxS[ij]  <  Vi,  j}. 

We  have  the  following  results. 

Theorem  2  Let  /(x)  =  7Ti7i(i)  ln(^r  irrXr/xi)  and 

C  =  H  In  ((K  -  1)/H)  -  E.to'Efjo1  J  (~  ^  Vi, if  1 2-  The 
weight  vector  x*  =  (#2,  •  •  • ,  xr~i)  of  the  asymptotically  best 
greedy  code  is  given  by  x*  —  arg  minxes  /(x*);  the  asymptotic 
compression  achieved  by  this  code  satisfies 

(lnM)'  c  +  /(xT 

Corollary  1  If  Pi,j'H{S[i,  j])  <  H(i)  for  all  i  and  j,  then  the 
greedy  code  with  weight  vector  (H(0), . . . ,  H(R  —  1))  is  asymp¬ 
totically  the  best  generalized  variable-to- fixed  length  code. 
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Abstract  —  The  entropy  H( X)  of  a  discrete  ran¬ 
dom  variable  X  of  alphabet  size  m  is  always  non¬ 
negative  and  upper-bounded  by  logm.  In  this  paper, 
we  present  a  theorem  which  gives  a  non-trivial  lower 
bound  for  H(X).  We  will  show  that  for  any  discrete 
random  variable  X  with  range  R  =  {#0i  •  •  •  ?  if 

p}  —  Pr{  X  =  Xi }  and  po  >  p\  ^  ■  -  •  Pm- 1  ?  then 


H( X)  > 


2  log  m 
m  —  1 


m  —  1 

Xip” 

2=0 


(1) 


with  equality  iff 

(i)  A"  is  uniformly  distributed,  i.e.,  pi  =  "  f°r  all  h  or 
trivially 

(ii)  po  =  1,  and  pi  =  0  for  1  <  i  <  m  -  1. 

I.  Introduction 

For  a  discrete  random  variable  X  with  range  R  = 
{xo,  • . . ,  Xm-i }  with  pi  =  Pr{ X  =  Xi}  for  0  <  i  <  m  —  1, 
the  entropy  of  the  random  variable  is  defined  as  [1] 

m—  1 

H(X)  =  -  A  P’  logP!  -  (2) 

2  =  0 

The  upper  bound,  if( X)  <  logm,  with  equality  iff  the  ran¬ 
dom  variable  X  is  uniformly  distributed,  is  well  known.  In 
this  paper,  we  will  prove  a  theorem  that  gives  a  useful  lower 
bound  on  entropy  for  finitely  valued  discrete  random  variables. 
In  [2],  an  upper  bound  for  a  constrained  entropy  of  infinitely 
valued  discrete  random  varibles  is  shown  under  certain  condi¬ 
tion.  Our  theorem  provides  a  lower  bound  for  the  constrained 
entropy. 


II.  The  Theorem 

Let  us  define  a  convex  region  Krn  and  a  set  A\ 


Km  =  {(ari . *rn)  :l>Xi>  xj  >  0,  i  <  j,  ,  n  =  1}, 

A  =  {ai, . . .  ,am},  a,  =  (b  . . . ,  i,0, . . .  ,0),  1  <  i  <  m. 


Clearly,  Ac  Km-  The  following  lemmas  can  be  shown: 
Lemma  1.  Km  is  the  convex  hull  of  A. 

Lemma  2.  If  f(x)  is  convex  U  on  Km  ,  then 

f(x)  <  max{/(ai), . . . ,  /(am)}.  (3) 

If  f(x)  is  strictly  convex,  then  equality  holds  iff  x  =  ai  for  some 
i  such  that  /(a7-)  is  the  maximum  among  all  f(ax), . . . ,  f(am)- 
Consider  the  function 

f(x)  =  f{x  1,  ...,Xm)  =  +  in  -  1  l0gm):ri’  (4) 

2  =  1 


defined  on  the  convex  region  Km  .  Since  “  >  0  f°r 

1  <  i  <  m,  f(x)  is  a  strictly  convex  U  function  on  Km.  We 
can  show  f(x)  <  0,  which  implies  our  theorem: 

Theorem.  For  any  discrete  random  variable  X  with  range 
R  =  i f  Pi  =  Pr{X  =  Xi}  and  po  >  Pi  > 

. .  .pm_i,  i.e.,  pi  are  in  a  non-increasing  order,  then 


H(X)  > 


2  logm 
m  —  1 


m  —  1 

Xip" 

2=0 


(5) 


with  equality  iff 

(i)  X  is  uniformly  distributed,  i.e.,  p%  =  —  for  all  i,  or 

(ii)  po  —  1,  and  pi  =  0  for  1  <  i  <  m  —  1. 


III.  Examples 

We  can  compute  lower  bounds  for  two  specific  examples  using 
the  above  theorem. 

(1)  Geometrical  Distribution: 


H(X)  > 


2  log  m  fl 
m  —  1 


1 

2  m- ! 


) 


^+0(1).  (6) 

m  —  1 


(2)  Binomial  Distribution: 

(rS=!,) 


H(X)  >  log m-  (- 


/¥lo|m+o(l)  (?) 

V  7r  ym 


IV.  Remark 

Let  us  define 

m  m 

Him,  a)  =  {-y>logp,  : 

i=1  i=1  (8) 

a  -  i-pi  can  be  viewed  as  the  average  number  of  guesses 

with  an  optimum  strategy  needed  to  guess  the  value  of  a 
random  variable  X  [2].  Let  HL(m,a)  =  miniL(m,a)  and 
Hu(m,a)  =  max  if  (m,  a).  Clearly,  Hu(mya)  is  a  monotoni- 
cally  increasing  function  of  m.  An  upper  bound  on  i/(m,a) 
is  [2] 

Hu(m,a)  <  lim  Hu{m,  a)  <  log  (a  -  1)  +  1,  (9) 

m— ¥oo 

when  a  >  2.  From  our  theorem,  we  can  provide  a  lower  bound 
on  if (m,  a): 

Hl (m,  a)  >  2  ^ (a  _  i).  (10) 

m  —  1 
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Abstract  —  Two  entropy-based  divergence  classes 
are  compared  using  the  associated  quadratic  differen¬ 
tial  metrics,  mean  values  and  projections. 

I.  Two  CLASSES  OF  DIVERGENCES 
The  design  concepts  of  divergences  are  of  interest  because 
of  the  key  role  they  play  in  statistical  inference  and  signal 
processing.  Most  of  the  existing  divergences  D  between  two 
probability  distributions  may  be  associated  with  an  integral 
or  non  integral  entropy  functional  H „(//)  with  respect  to  some 
reference  measure  v.  We  distinguish  two  different  classes 
of  divergences  built  on  entropies.  The  first  one  is  the  well 
known  class  of  /-divergences  1/  [4]  which  are  based  on  the 
likelihood  ratio  and  are  formally  identical  to  the  above  en¬ 
tropies  =  ~H is(fi).  In  the  integral  case,  this  yields  the 

relative  entropy  class,  which  includes  Kullback  information  as 
its  most  prominent  member  [4].  The  most  important  instance 
of  non  integral  /-divergences  is  Renyi  information  [13]. 

The  second  class  of  divergences  builds  upon  the  concavity 
of  an  entropy  functional,  which  entails  that,  for  0  <  a  <  1, 

v)  =  H((l  —  a)p  -j-  av)  —  (1  —  a)H(/z)  —  uH(^)  is  pos¬ 
itive.  One  can  then  construct  CH(p,v)  =  maxa  J^(^,  J'),  a 
Jensen  difference  J H(p,  v)  =  J (#/2)(/x,  ^),  and  a  Bregman  dis¬ 
tance  D h(p,v)  =  lima_*o  a-1  3^\p,  ^).  Bregman  distances 
enjoy  an  Euclidian- like  property,  similar  to  the  Pythagorean 
theorem  [9,  7],  when  involved  in  projections  onto  exponen¬ 
tial  or  mixture  families.  This  may  be  further  generalized  to 
projections  onto  ‘cr-families’  as  shown  in  [2]  where  families  of 
distributions  are  dealt  with  as  differential  manifolds.  Still  in 
this  geometrical  vein,  the  interplay  between  CH}3 H  and  D# 
can  be  understood  via  Thales  theorem. 

A  local  quadratic  differential  metric  is  associated  with 
any  divergence  measure  [12,  2].  Based  on  the  fact  that  /- 
divergences  are  locally  equivalent  to  the  Riemannian  metric 
defined  by  the  Fisher  information  matrix,  we  characterize  the 
intersection  of  the  two  above  divergence  classes.  In  particular, 
it  is  easily  found  that  the  only  Bregman  distance  D#  which 
is  a  /-divergence  is  Kullback  information  [9,  7].  Similarly,  it 
is  found  that  the  only  /-divergences  which  can  be  written  as 
a  Jensen  difference  3^  are  those  introduced  in  [11,  10]. 

II.  Associated  mean  values  and  projections 

Mean  values  can  be  associated  with  entropy-based  di¬ 
vergences  in  two  different  ways.  The  first  way  [13,  l] 
consists  in  writing  explicitly  the  generalized  mean  values 
<t>~1  (EIU  ft'  ^(P*))  underlying  integral  and  non  integral  /- 
divergences.  Here  the  /?’s  are  normalized  positive  weights. 
For  Renyi  information,  (j>{u)  =  ua ,  and  this  results  in  a  -mean 
values  (£”=,  P,  P?)1/a. 

The  second  way  [3]  consists  in  defining  mean  values  by 
argmin^  £"=1  ft 1  d(v,  ^i),  namely  as  projections ,  in  the  sense 

1  The  authors  are  also  with  Gdr  Cnrs  no  134  ‘Traitement  du 
Signal  et  Images’. 


of  distance  d,  onto  the  half-line  m  =  >0  [7].  When 

d  is  an  integral  /-divergence  d{v,Ui)  =  wJ/(^L),  this  gives 
the  entropic  means  [3],  which  are  characterized  implicitly  by 
£r=i  fti  ff  (t~)  =  0,  and  necessarily  homogeneous  (scale  in¬ 
variant).  The  class  of  entropic  means  includes  all  available  in¬ 
tegral  means  and,  when  applied  to  a  random  variable,  contains 
most  of  centrality  measures  (moments,  quantiles).  When  d  is 
a  Bregman  distance  dh(u,  v)  —  h(u)  -  h(v)  -(u~  v)h'(v),  the 
corresponding  mean  values  are  exactly  the  above  generalized 
mean  values  (for  (j)  =  h'),  which  are  generally  not  homoge¬ 
neous. 

The  only  generalized  mean  value  which  is  also  an  en¬ 
tropic  mean,  and  thus  both  an  /-divergence-projection  and  a 
Bregman-projection,  is  the  above  or-mean  value,  correspond¬ 
ing  to  Renyi  information  [3].  This  agrees  with  invariant  prop¬ 
erties  of  means  [8]  and  the  axiomatic  of  inference  in  [5]. 

Finally,  we  mention  that  mutual  information  (viewed  both 
as  relative  entropy  and  Jensen  difference)  and  the  related  con¬ 
cepts  of  channel  capacity  [6]  and  information  radius  [14],  can 
be  seen  as  another  manner  of  investigating  the  intersection  of 
the  above  two  divergence  classes. 
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Abstract — We  reconsider  the  minimum/optimal  bit-error 
probability  receiver  (OBER)  for  intersymbol  interference 
channels  with  Gaussian  noise  and  the  reception  of  finite 
blocks  of  bits.  We  view  the  OBER  as  a  function  with 
two  inputs:  the  received  sequence  and  an  expected  signal- 
to-noise  ratio;  and  one  output:  the  estimated  block  of 
bits.  Assuming  that  all  sequences  are  equally  probable  to 
be  transmitted  we  prove  two  results  about  the  behaviour 
of  the  OBER.  We  show  that  the  OBER  coincides  with 
the  maximum  likelihood  sequence  detector  when  designed 
for  high  signal-to-noise  ratios  and  that  it  collapses  to  a 
matched  filter  followed  by  a  hard-limiting  device  for  low 
expected  signal-to-noise  ratios. 

I.  A  BLOCK  TRANSMISSION  SYSTEM  MODEL 

After  the  introduction  of  the  Viterbi  detector  as  a  Max¬ 
imum  Likelihood  Sequence  Detector  (MLSD)  [3],  the  opti¬ 
mal,  or  minimum,  bit-error  probability  receiver  ( OBER) 
[1] ,  [2] ,  [4]  for  intersymbol  interference  (ISI)  channels  has 
not  been  given  much  attention  as  a  practical  receiver. 
We  reconsider  the  OBER  for  block  transmission  systems , 
to  gain  insight  to  its  properties  and  its  relation  to  the 
MLSD. 

Consider  the  transmission  of  blocks  of  binary  data 
through  a  channel  with  known  ISI  and  additive  Gaus¬ 
sian  noise  at  the  receiver.  Let  the  vector  b  6  {— 1,  +1}^ 
denote  the  block  of  independent  bits  to  be  transmitted. 
We  represent  the  transmission  system  in  matrix  notation 
as: 

y  —  Hb  +  n,  (1) 

where  H  is  a  deterministic  and  known  matrix  repre¬ 
senting  the  ISI,  the  noise  n  €  A(0,cr^l)  and  y  is  the 
(N  +  L)  x  1  stochastic  vector  observed  by  the  receiver. 
Further,  let  7/  denote  the  outcome  of  y. 

II.  The  Optimal  Bit-Error  Probability 
Receiver 

Let  us  consider  the  detection  of  (b)bitfc.  A  geometric 
interpretation  of  this  binary  hypothesis  testing  is  that  of 
choosing  the  correct  halfcube: 

H0:  b€C+,  Hi:  b  e  C*.  (2) 


Proposition  1:  Let  ^(y|/3)  denote  the  conditional  den¬ 
sity  of  y  given  that  the  sequence  /3  was  transmitted  (here 
multi-dimensional  Gaussian).  Furthermore,  let 

Bober  (y)  ^  r(y,<jn)  ^  signj  w{y,0)0  J  (4) 

W,+  / 

and  w(y,0)  =  V;(y|/3)Pr{b  =  0}  -  ip(y\  -/3)Pr{b  = 
— / 3 },  where  Pr{b  =  (3 }  is  the  probability  for  the  sequence 
(3  €  C  being  transmitted.  Then  bOBER  (y)  is  the  detector 
of  the  transmitted  bits  that  minimizes  the  bit-error  prob¬ 
ability.  ■ 

Note  that  (4)  represents  a  parallel  block  processor  struc¬ 
ture,  simultaneously  detecting  all  the  individual  bits. 

As  indicated  by  (4),  we  find  it  instructive  to  view  the 
OBER  as  a  function  T(y,  a)  with  two  inputs,  y  and  a. 
The  parameter  a2  is  the  variance  the  OBER  is  designed 
for,  and  controls  the  decision  regions  in  JRNi~L  where  y 
takes  its  values.  Thus,  the  OBER  depends  on  the  ex¬ 
pected  SNR  and  is  only  optimal  when  the  expected  vari¬ 
ance  a2  and  the  true  variance  agree. 

III.  The  behaviour  of  the  OBER 

We  will  discuss  asymptotical  properties  of  the  OBER 
by  studying  the  function  T(-,  •)  as  defined  in  (4).  Assum¬ 
ing  that  all  sequences  are  equally  probable  to  be  trans¬ 
mitted  we  show  that 

lim  r(x,  ot)  —  bMLSD(x) ,  for  all  x  €  JR.N+L,  (5) 

a— >0 

and  that  „  .. ,  r 

lim  r(x,a)  =  sign(HTx) ,  for  all  x  6  IRW+  .  (6) 

a— >oo 

Equation  (5)  means  that  the  OBER  designed  for  a  high 
SNR  becomes  the  MLSD.  It  is  because  of  this  is  that 
the  MLSD  will  achieve  the  minimum  attainable  bit-error 
probability  when  used  in  systems  with  a  high  SNR,  cf.  [3]. 
In  equation  (6)  we  find  a  similar  comparison  for  low  SNR 
between  the  OBER  and  the  matched  filter  with  hard  deci¬ 
sions.  If  the  true  SNR  is  low,  the  best  possible  receiver  is 
actually  the  matched  filter  receiver  as  comes  to  the  BER. 
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where  C k  and  Ck  are  the  halfcubes  with  (b)bit  k  =  +1 
and  (b)bit  k  =  -1,  respectively.  The  Bayes  decision  rule 
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spectively. 
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Abstract — Building  on  Forney’s  concept  of  the  genie  [4], 
[5],  and  introducing  the  idea  of  an  explicit  statistical  descrip¬ 
tion  of  the  side  information  provided  to  the  genie-aided 
detector,  we  develop  a  generic  tool  for  derivation  of  lower 
bounds  on  the  bit-error  rate  of  any  actual  receiver  [3]. 
With  this  approach,  the  side  information  statistics  become 
design  parameters ,  which  may  be  chosen  to  give  the  result¬ 
ing  bound  a  desired  structure.  To  illustrate  this,  we  choose 
statistics  in  order  to  obtain  a  special  case:  the  lower  bound 
derived  by  Mazo  [6].  The  statistical  description  of  the  side 
information  makes  the  lower  bounding  a  transparent  ap¬ 
plication  of  Bayesian  theory. 

I.  Introduction 

The  idea  of  a  good  genie  with  a  corresponding  genie- 
aided  detector  (GAD)  has,  in  particular,  often  been  used 
to  determine  a  lower  performance  bound  for  the  proba¬ 
bility  of  bit-error  [1],  [2],  [3],  [4],  [5].  The  GAD  has  access 
to  more  information  than  any  actual  detector:  it  has  ac¬ 
cess  to  the  side  information  supplied  by  the  genie  and 
is  expected  to  handle  all  information  optimally.  Because 
of  this,  it  is  argued,  it  cannot  have  a  worse  performance 
than  any  detector  working  without  the  side  information. 
However,  in  order  that  optimal  processing  of  the  side  in¬ 
formation  be  well-defined  in  the  sense  of  Bayesian  de¬ 
tection  theory,  an  explicit  (statistical)  description  of  the 
side  information  is  required.  This  paper  introduces  such  a 
representation  of  the  genie,  augmenting  the  foundational 
ideas  of  Forney’s  work  in  [4],  [5].  Our  aim  is  to  introduce 
the  side  information  supplied  by  the  genie  as  the  output 
of  a  “side  information  channel”  parallel  to  the  original 
channel  and  governed  by  a  probabilistic  rule  with  free  pa¬ 
rameters. 

II.  The  Side  Information  Channel 

Consider  a  transmission  system  where  binary  data  is 
sent  through  a  discrete- time,  additive  Gaussian  channel 
with  intersymbol  interference,  and  where  additional  side 
information  is  carried  to  the  detector  through  a  parallel 
channel  (representing  the  genie). 

We  discuss  the  detection  of  bit  number  k  in  the  impor¬ 
tant  special  case  when  the  side  information  consists  of  a 
pair  of  sequences  and  one  of  the  sequences  is  equal  to  the 
transmitted  sequence,  cf.  [4],  [5].  Define  C£  and  as  the 
sets  of  sequences  with  the  bit  in  position  A;  as  +1  and  —1, 
respectively,  and  denote  the  side  information  with  z  and 
its  outcome  with  £.  Let  £  consist  of  pairs  in  x  Cj(  of 
the  form  C ij  *  (/3+,/3“),  for  1  <  i,  j  <  2N~1.  With  the 
transmitted  sequence  being  /3,  let  the  additional  sequence 
be  chosen  at  random  among  the  sequences  differing  from 
(3  in  bit  A;,  according  to  the  known ,  probabilistic  transition 


rule: 

[  p(j\i)  if  /3  =  at  e  C+ 

Pr  {z  =  Ci,i|b  =  /3}  =  <  q(i\j)  if  /3  =  0j  e  Ck 

[  0  otherwise. 

Hence,  the  properties  of  the  genie,  or  equivalently,  the 
properties  of  the  output  of  the  side  information  channel, 
are  defined  by  the  statistics  (or  transition  probabilities) 
p{j\i)  and  q(i\j). 

III.  The  Genie- Aided  Detector 
With  the  complete  statistical  description  of  the  trans¬ 
mission  system,  including  a  set  of  transition  probabilities, 
the  GAD  with  minimum  bit-error  probability  is  derived 
in  terms  of  a  binary  Bayesian  hypothesis  test.  By  evalu¬ 
ating  the  performance  of  this  GAD,  a  lower  bound  on  the 
probability  of  bit-error  of  any  detector,  with  or  without 
access  to  the  side  information,  is  obtained  as 

^ber ,k  >  YjijQ{~2L  +  Pr  {k  “  &j  }  + 

EijQ{^  -  At)  p0N) Pr  lb  =  &} > 

where  d^j  is  the  Euclidian  distance  between  f3f  and  f3J , 
_  4(*b')Pr{b  =  /3j} 

liJ~  p(j\i)Pv{b  =  (3t} 

and  Q(x)  *  (1/VfcF)  /?+°°  e~t2/2dt. 

The  transition  probabilities  { p(j\i ),  q(i,j)}  are  free  pa¬ 
rameters  which  can  be  chosen  to  optimize  the  perfor¬ 
mance  of  the  GAD.  They  might  for  example  be  chosen  to 
make  the  corresponding  bound  tight,  or  to  give  the  bound 
a  simple  structure.  We  choose  several  sets  of  transition 
probabilities  as  examples  in  order  to  discuss  the  proper¬ 
ties  of  their  respective  performance  bounds.  In  this,  we 
also  discuss  the  relation  of  the  attainable  performance 
bounds  to  the  works  by  Forney  [4],  [5]  and  Mazo  [6]. 
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Abstract  -  A  new  family  of  nonlinear  decision  delay- 
constrained  receivers  minimizing  the  symbol-decoding-error 
probability  of  QAM-  or  PSK-modulated  digital  information 
sequences  transmitted  over  time-dispersive  time-varying  noisy 
waveform  channels  is  presented.  New  (generally)  time-varying 
Bhattacharyya-type  upper  bounds  for  the  performance 
evaluation  of  the  proposed  receivers  are  also  presented. 

Summary 

In  this  work  a  novel  solution  for  the  optimal  synthesis  of  nonlinear 
receivers  for  the  detection  of  digitally  modulated  (QAM  or  PSK) 
information  sequences  transmitted  over  generally  time- varying  channels 
impaired  by  known  ISI  and  additive  noise  is  presented  for  the  case  when  the 
decoding-decision-delay  A  is  limited  and  finite.  Receivers  which  minimize 
the  symbol  error  probability  (i.e. ,  symbol-by-symbol  MAP  decoders)  are 
considered. 

The  known  solution  presented  in  [1]  for  a  similar  problem  holds  for  the 
case  of  data  transmission  systems  with  time-invarying  waveform  channels 
and  unquantized  soft-decision  demodulation.  Moreover,  the  algorithm  in  [1] 
has  been  obtained  by  means  of  a  direct  application  of  Bayes’  rule  so  that  the 
resulting  receiver  complexity  grows  exponentially  with  the  decision-delay 
A;  as  a  consequence,  the  implementation  of  such  receivers  for  multilevel 
digital  signalling  seems  to  be  unattractive  even  for  small  values  of  A  [4, 
Sect. 6.6]. 

In  this  work  a  M-level  quantizer  is  assumed  present  at  the  output  of  the 
noisy  waveform  channel  so  that  the  finite  word-length  effects  of  digital 
receivers  can  be  suitably  taken  into  account.  Moreover,  the  approach 
followed  to  synthesize  the  MAP  decoder  is  completely  different  from  that  in 

[1]  and  is  based  on  the  modeling  of  the  ISI  channel  as  a  sequential  Moore- 
type  finite-state-machine.  This  allows  to  adopt  the  recursive  Kalman-like 
algorithms  of  [2]  for  the  computation  of  the  sequence  {7c(k  I  k+A),  k  >  1 } 
of  the  A  Posteriori  Probabilities  (APPs)  of  the  so-called  "channel  state" 
Markov  chain  (x(k)  e  1/  =  {ui,  U2,  »  un)>  k  >  0}  (defined  as  in 

[3, Sect. II])  for  every  assigned  decision-delay  A.  The  main  advantage  of  this 
approach  is  that  the  implementation  of  the  resulting  MAP  decoders  exhibits 
a  complexity  which  grows  only  linearly  with  the  value  assumed  by  the 
decision-delay  A.  In  fact,  the  following  expression  for  the  computation  of 

the  APP  sequence  (recursive  with  respect  to  the  k-index)  holds: 

k+A 

Jt(k  I  k  +  A)  =  A  rc(k-l  I  k+A-1)  +  X  P(k;m)0(m).  (1) 

m=k 

In  (1)  the  matrix  A  is  the  probability  transition  matrix  of  the  Markov 
chain  {x(k»  and  the  sequences  (P(k;m)}  and  (0(m)}  can  be  recursively 
calculated  as  in  [2].  Starting  from  the  above  APPs  sequence,  the 
corresponding  MAP  estimate  sequence  { £m AP(kik+A)e  A  }  of  the 
transmitted  S-ary  information  sequence  {a(k)e  !A.  -  {ai,  a2, ...  ,  as),  k  > 
0 }  can  be  easily  computed  following  quite  standard  procedures  (see,  for 
example,  [3,Sect.IV]). 

As  far  as  the  performance  evaluation  of  the  mentioned  nonlinear  MAP 
decoders  is  concerned  we  observe  that,  from  the  authors'  knowledge,  no 
explicit  analytical  expressions  are  known  in  literature  (see  [4,  Sect.6.6]). 
Starting  from  Eq.(l),  new  (generally)  time- varying  Bhattacharyya-type 
upper  bounds  for  the  performance  evaluation  of  the  proposed  MAP 
decoders  have  been  derived  as  follows: 

P(aMAP(klk+^)  *  a0O)  - 

S  S  fMk+A+l  ^ _  _ 1 

x  x  ^p(Yo+4 =yo+A(m)ia(ic)=ar)p(Y®+4 =y“+4(m)ia(k)=aiH' 

Sj=l  r=l  [  m=l  ) 

r*j 


where  yJ+A(m)  is  the  m-th  determination  assumed  by  the  ordered  random 
sequence  Yj+A,  constituted  by  the  quantized  noisy  data  received  at  the 
channel  output  from  step  0  to  step  k+A.  Simulation  results  proved  that  the 
upper  bounds  of  Eq.(2)  are  quite  tight  for  error  probabilities  below  102. 

The  performance  of  the  proposed  symbol-by- symbol  MAP  receivers 
have  been  compared  to  that  pertaining  to  the  conventional  sequence 
Maximum  Likelihood  (ML)  receivers  (based  on  the  classic  Viterbi 
Algorithm  with  optimized  branch  metric).  Computer  simulations  showed 
that  the  performance  of  the  presented  receivers  overcomes  that  of  the  ML 
sequence  receivers  when  the  transmission  channel  is  largely  time-dispersive 
and  the  signal-to-noise  ratio  (SNR)  at  the  receiver  site  is  quite  low,  so  that 
the  proposed  decoders  could  be  attractive,  in  particular,  for  HF  channel 
equalization.  Moreover,  for  the  MAP  decoders  at  hand  a  decision-delay  A  of 
the  order  of  the  length  L  of  the  channel  impulse  response  (measured  as 
multiples  of  the  signalling  period  T)  results  in  a  negligible  performance  loss 
with  respect  to  the  ideal  case  A  =  <*>,  while  a  delay  A  of  5-6  times  the  length 
Xis  in  general  required  for  the  corresponding  ML  decoders. 

As  illustrative  example,  in  Table  I  the  bit-error-rate  (BER)  for  the  case 
of  a  BPSK-modulated  binary  message  sequence  crossing  the  discrete-time 
baseband  ISI  channel  of  [4],  Tab.6.7.1,  of  length  £=6  are  reported.  Hard- 
decision  demodulation  and  AWGN  are  assumed;  the  signal-to-noise  ratio  is 
evaluated  at  the  input  of  the  receiver’  quantizer.  In  Tab. II  the  corresponding 
steady-state  values  of  the  Bhattacharyya-like  bound  (2)  are  reported.  In  [5] 
the  symbol-by-symbol  MAP  decoders  described  in  this  work  are  employed 
for  decoding  Trellis-encoded  data  sequences.  It  is  finally  observed  that  if  the 
transmitted  sequences  are  equiprobable,  the  proposed  MAP  receivers 
coincide  with  the  corresponding  symbol-by-symbol  ML  receivers. 
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length  L= 6 

Sequence  detector s(V A) 
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A  =  Lr\ 
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A  =  i>l 

A=L¥\ 
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0.0122 
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A=£+l 
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0.3910 

SNR  =  15 

0.0995 

0.0502 
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Abstract  —  Decision- feedback  suffers  from  the  prob¬ 
lem  that  wrong  decisions  deteriorate  further  deci¬ 
sions  by  increasing  the  interference  in  the  observa¬ 
tion.  MMSE-optimal  feedback  minimizes  this  resid¬ 
ual  interference  power.  Applications  include  decision- 
feedback  equalization  and  delay  estimation  in  code¬ 
division  multiple-access  (CDMA)  systems. 

I.  Introduction 

In  many  applications,  one  observation  (e.g.,  a  sequence)  can 
give  rise  to  decisions  on  many  random  variables.  For  opti¬ 
mal  results  in  the  maximum-likelihood  (ML)  sense,  all  ran¬ 
dom  variables  have  to  be  estimated  jointly.  Decision- feedback 
can  be  used  as  a  less  complex  but  sub  optimal  method.  The 
estimate  of  each  random  variable  in  turn  is  fed  back  to  the 
observation  with  the  aim  of  reducing  the  influence  of  this  ran¬ 
dom  variable  on  further  decisions.  One  application  is  decision- 
feedback  equalization:  data  estimates  are  appropriately  fil¬ 
tered  and  fed  bapk  to  cancel  out  the  interference  from  the  cor¬ 
responding  data  symbol  on  future  decisions.  However,  wrong 
decisions  can  increase  the  influence  of  a  previously  decided 
symbol  instead  of  diminishing  it.  E.g.,  a  wrong  decision  on 
a  binary  antipodal  symbol  increases  the  interference  power  of 
that  symbol  in  the  observation  by  a  factor  of  four.  The  prob¬ 
lem  arises  from  the  implicit  assumption  of  decision-feedback 
equalization  that  all  decisions  are  correct. 

II.  MMSE-Optimal  Feedback  Strategy 

Our  purpose  is  to  mitigate  the  detrimental  effects  of  decision- 
feedback  by  an  improved  feedback  scheme.  We  will  treat  the 
case  where  an  observation  Y  (e.g.,  an  infinite-length  sequence 
y[-])  can  be  expressed  as  a  sum  of  two  real-valued  terms,  one 
of  which  is  independent  of  the  random  variable  X  (e.g.,  a  data 
symbol  X[n[)  to  detect.  The  observation  can  be  written  as 

y  =  y0 +  /(*),  (i) 

where  /(•)  denotes  an  arbitrary  function.  Every  feedback 
scheme  subtracts  some  function  r(Y)  from  the  observation 
Y,  and  hence  the  latter  becomes 

Y'  =  Y0  +  f(X)-r(Y).  (2) 

Being  interested  in  minimizing  the  impact  of  X  on  the  ob¬ 
servation  y,  a  reasonable  criterion  of  goodness  is  the  average 
residual  power  due  to  X  after  cancellation.  Therefore,  one  is 
interested  in  finding 

ro(y)  =  axg{min£[||/(X)  -  r(y)||2  |  Y  =  y]>,  (3) 

r(-) 

where  ||  ■  ||2  denotes  the  squared  Euclidean  norm. 

The  problem  raised  by  (3)  is  an  instance  of  the  well- 
understood  Bayesian  (nonlinear)  minimum  mean-squared  er¬ 
ror  (MMSE)  estimation  problem  (see,  e.g.,  [1,  Section  7-5]). 
It  follows  that  the  MMSE-optimal  feedback  function  is 

r0(Y)  =  E[f(X)  |  y  =  y],  all  y.  (4) 


III.  MMSE-Optimal  Feedback  Equalization 

In  a  decision-feedback  scheme,  the  observation  Y  corresponds 
to  the  received  sequence  Y[]  =  ^^=0y[ra]X[.  — m]  +  Z[*].  Af¬ 
ter  deciding  on  a  transmitted  symbol  X[k],  a  decision- feedback 
scheme  subtracts  the  sequence 

[•]=$[• -*]■*[*]•  (5) 

On  the  other  hand,  the  MMSE-optimal  scheme  subtracts  [3] 

r{0k)[-}=g[--k}-E[X[k}  I  y[.]  =»[•]].  (6) 

IV.  Delay  Estimation  in  CDMA 
MMSE-optimal  feedback  can  also  be  applied  to  estimate  the 
relative  transmission  delays  of  the  users  of  an  asynchronous 
code-division  multiple-access  (A-CDMA)  system.  A  possible 
scheme  can  be  derived  from  successive  cancellation  [2]:  the 
users’  delays  are  estimated  in  turn  and  appropriately  sub¬ 
tracted  from  the  observation  to  improve  subsequent  estimates. 
Again,  better  performance  for  this  general  feedback  scheme  is 
achieved  by  using  MMSE-optimal  feedback.  Figure  1  illus¬ 
trates  the  gain  for  a  specific  A-CDMA  system  with  randomly 
chosen,  repeatedly  emitted  synchronization  sequences. 

Other  promising  multi-user  applications  include  MMSE- 
optimal  multi-user  decision-feedback  and  interference  cancel¬ 
lation  in  CDMA  data  detection  (cf.  [4]  for  a  related  approach) . 
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Fig.  1:  Successive  cancellation  for  A-CDMA;  31  equal- 
energy  users  with  length-31  spreading  sequences. 
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Abstract  —  This  paper  compares  the  Coded  Orthog¬ 
onal  Frequency  Division  Multiplexing  (COFDM)  sys¬ 
tem  and  a  single-carrier  system  using  decision  feed¬ 
back  equalization  (DFE)  in  a  Rayleigh-fading  environ¬ 
ment  assuming  perfect  knowledge  of  the  channel  and 
ignoring  error  propagation  in  the  DFE.  Analytic  tech¬ 
niques  are  introduced  to  bound  the  average  probabil¬ 
ity  of  error  of  a  single-carrier  system  using  decision- 
feedback  equalization  and  the  average  probability  of 
error  of  a  COFDM  system  in  a  two- path  fading  chan¬ 
nel. 

1.  Introduction 

We  compare  a  single-carrier  broadcast  system  using  decision 
feedback  equalization  (DFE)  to  COFDM  in  a  slowly  Rayleigh- 
fading  environment.  Analytic  techniques  are  introduced  for 
bounding  probability  of  error  for  systems  using  DFE  and 
COFDM.  Diversity  is  a  well-known  technique  to  reduce  the 
average  probability  of  error  in  a  fading  channel  [l],  [2].  This 
paper  shows  how  the  inherent,  diversity  of  a  single-carrier  sys¬ 
tem  using  a  DFE  is  equivalent  to  the  inherent  diversity  of 
a  COFDM  system  in  a  two-path  fading  channel  with  proper 
coding  and  interleaving. 

II.  Diversity  Calculations  for  a  DFE  in  a 

TWO-TAP  FADING  CHANNEL 
We  consider  the  performance  of  a  DFE  when  the  received 
channel  pulse,  after  any  receiver  filtering  and  symbol-spaced 
sampling,  is  a  two-tap  channel.  To  upper-bound  the  probabil¬ 
ity  of  error  of  the  DFE,  we  consider  the  zero- forcing  DFE,  be¬ 
cause  a  zero-forcing  DFE  will  have  a  higher  probability  of  error 
than  a  DFE  [2].  The  zero-forcing  DFE  will  convert  the  chan¬ 
nel  pulse  response  to  one  that  is  causal,  monic  and  minimum- 
phase.  It  will  then  subtract  the  precursor  ISI.  Consider  a  two- 
tap  channel  pulse  response,  h(D)  —  ho  +  hi  D.  If  |/*o|  >  \h\\, 
the  feedforward  section  of  the  equalizer  (including  matched 
filtering)  will  be  simply  and  the  feedback  section  will  be 

Without  loss  of  generality,  assume  E[x2]/a2  =  1.  The 
resulting  instantaneous  SNR  will  be  |/io|“.  Now  suppose  that 
|fii|  >  |/io|.  In  this  case,  the  D-transform  of  the  feedforward 

section  of  the  equalizer  will  be  Wzf-dfe(E)  =  h*h(°h++'hQD-i) 
and  the  feedback  section  will  be  D.  In  this  case,  the  result¬ 
ing  SNR  will  be  | A 1 1 2 .  Therefore,  the  zero-forcing  DFE  in  the 
two-tap  channel  case  selects  the  larger  of  the  two  paths.  This 
is  equivalent  to  selection  diversity  for  a  two-antenna  channel. 
Therefore,  selection  diversity  provides  an  upper- bound  to  the 
probability  of  error  for  a  DFE  in  a  Rayleigh-fading  channel. 

JThis  work  was  supported  by  NSF  grant  2DPL133 
2 of  Information  Systems  Laboratory 
Stanford  University 
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On  the  other  hand,  the  matched-filter-bound  for  a  two-tap 
fading  channel  can  be  used  to  lower-bound  the  probability  of 
error  for  a  DFE  [3].  Both  the  upper  and  lower  bounds  for 
a  DFE  in  a  fading  two-path  fading  channel  exhibit  two-path 
diversity.  Therefore  a  DFE  exhibits  two-path  diversity  in  a 
fading  channel. 

III.  Diversity  Calculations  for  a  COFDM 

SYSTEM 

Given  a  COFDM  system  with  convolutional  coding  and  in¬ 
terleaving  across  frequency  tones,  we  can  find  the  probability 
that  a  codeword  is  mistaken  for  its  nearest  neighbor  by  rec¬ 
ognizing  that  the  coded  SNR  is  a  quadratic  sum  of  complex 
Gaussian  random  variables.  This  will  give  a  conservative  ap¬ 
proximation  to  the  amount  of  diversity  inherent  in  a  COFDM 
system.  Given  two  paths  in  the  channel  separated  by  time  r, 
we  can  write  the  SNR  at  tone  i  by 

Wi  =  \h.0\2 +  \h1\2 +h.0hU3U'r +  h'0h1e~lu'-T  ,  (1) 

where  and  T  is  the  width  of  each  tone  in  the  OFDM 

symbol. 

Now  the  coded  SNR  can  be  written  as: 

^  ^  otiWj  =  y  ^  Q?i  (|M2  H~  l^i  |~)  ~i~ 

t  €/  t  €/ 

2Real(£  Qihohl e)U' r)  ,  (2) 

iei 

where  I  is  a  set  that  indexes  the  differing  tones  between 
nearest-neighbor  codewords  and  at  adjusts  the  SNR  to  reflect 
the  distance  between  coded  symbols  on  a  given  branch  of  the 
trellis.  Equation  (2)  has  the  same  form  as  the  instantaneous 
SNR  of  the  matched  filter  bound  for  a  two-tap  fading  channel 
found  in  [3].  We  can  use  this  to  show  that  the  diversity  of  the 
system  is  at  most  2  for  a  two-path  channel,  regardless  of  the 
number  of  diversity  branches  of  the  code. 

IV.  Conclusions 

This  paper  introduces  analytic  techniques  to  bound  the  prob¬ 
ability  of  error  for  both  a  single-carrier  system  using  a  DFE 
and  a  COFDM  system.  It  shows  analytically  that  in  a  two- 
path  channel,  both  a  single-carrier  system  using  a  DFE  and 
a  COFDM  system  with  interleaving  across  the  tones  exhibit 
two-path  diversity  in  the  average  probability  of  error. 
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Due  to  its  low  complexity  and  robust  performance,  the  deci¬ 
sion  feedback  equalizer  (DFE)  [1]  continues  to  play  an  impor¬ 
tant  role  in  high  data  rate  and/or  low  cost  systems,  e.g.,  digi¬ 
tal  subscriber  lines,  magnetic  recording  and  (possibly)  mobile 
radio.  Here  we  examine  the  possibility  of  recursive  equalizers 
which  perform  soft-decisioning  with  complexity  comparable  to 
the  DFE.  The  approach  taken  here  involves  initially  making 
decisions  like  a  DFE,  followed  by  post  filtering  of  these  deci¬ 
sions  using  a  recursive  (conditionally)  linear  filter  structure; 
we  call  this  a  decision  feedback  filter  (DFF).  Significantly,  the 
DFF  can  be  set-up  to  provably  retain  the  performance  capa¬ 
bility  of  the  DFE  at  high  SNR,  and  empirically  has  improved 
performance  over  a  wide  range  of  SNR. 

The  DFF  is  derived  assuming  the  usual  AWGN  FIR  equiv¬ 
alent  baseband  model  (it  is  possible  to  modify  the  DFF  to 
take  into  account  correlated  noise  as  would  arise  from  us¬ 
ing  a  mean-square  whitened  matched  filter  in  the  front  end). 
Starting  with  the  fixed-lag  Kalman  filter  (KF)  [2],  the  cur¬ 
rent  symbol  estimate  and  error  variance  are  isolated  from  the 
previous  symbol  estimates  and  error  covariance.  Then  the 
current  symbol  (linear)  estimate  is  replaced  by  a  (nonlinear) 
MAP  estimate  (based  on  the  approximation  that  the  current 
observation  is  conditionally  Gaussian,  conditioned  on  the  cur¬ 
rent  symbol  and  past  data),  and  the  error  variance  is  adjusted 
accordingly.  The  current  symbol  estimate  is  thus  filtered  and 
fed  back,  and  eventually  (after  a  fixed  number  of  additional 
observations)  thresholded  to  obtain  the  final  estimate.  Some 
simulation  results  demonstrate  the  improved  BER  of  the  DFF 
compared  with  the  DFE. 

A  rigorous  analysis  of  the  DFF  is  performed.  It  turns  out 
that  the  stability  and  performance  of  the  DFF  is  related  to 
the  magnitude  of  the  (computed)  conditional  error  variance 
pk  of  the  current  symbol  estimate.  We  identify  two  critical 
constants  a,/3(j3  <  a)  with  the  following  properties: 

(i)  If  sup  pk  <  ot  then  the  DFF  state  is  mean  square 

k 

bounded,  uniformly  as  SNR  — ►  oo  (this  is  true  even 
for  nonminimum  phase  channels,  in  contrast  to  the  KF 
which  tends  toward  instability  as  SNR  — >  oo  [3]). 

(ii)  If  sup  pk  <  f3  then  the  DFF  BER  is  asymptotically 

k 

upper  bounded  by  the  DFE  BER  as  SNR  — ►  oo. 

Since  pk  is  random  in  the  DFF  (since  the  current  symbol  es¬ 
timate  is  nonlinear)  these  conditions  would  generally  have  to 
be  imposed  in  order  to  guarantee  one  or  both  of  the  above 
properties,  i.e.,p*  would  be  replaced  by  max(pfc,7)  for  some 
7  <  cv  .  These  results  are  proved  using  variations  on  compar¬ 
ison  techniques  familiar  in  the  analysis  of  recursive  stochastic 
algorithms,  and  some  basic  results  on  DFEs  [4].  The  novely 
in  the  analysis  lies  in  the  fact  that  the  continuous-state  DFF 
can  be  effectively  compared  to  the  discrete-state  DFE. 
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Abstract  —  We  present  a  novel  scheme  that  com¬ 
bines  decision  feedback  equalization  (DFE)  with  high- 
rate  error-detection  coding  in  an  efficient  manner. 
The  proposed  scheme  is  evaluated  both  analytically 
and  by  means  of  comprehensive  computer  simula¬ 
tions.  In  our  analysis,  we  introduce  an  approximate 
mathematical  model  taking  into  account  the  error 
propagation  phenomenon.  Both  evaluation  methods 
show  that  power  savings  of  2.5  dB  to  3  dB  over  the 
conventional  DFE  can  be  achieved  at  the  expense  of 
a  moderate  complexity  increase. 

I.  Introduction 

Motivated  by  the  desire  to  transmit  the  maximum  possi¬ 
ble  data  rate  through  a  band-limited  additive-noise  channel 
with  intersymbol  interference  (ISI),  a  considerable  research 
effort  has  been  devoted  to  equalization  techniques  for  such 
channels.  Various  approaches  to  the  equalization  problem 
can  be  roughly  divided  into  three  classes:  linear  equaliza¬ 
tion,  decision-feedback  equalization  (DFE),  and  maximum- 
likelihood  sequence  estimation  (MLSE).  DFE  can  significantly 
outperform  the  linear  equalizer  on  channels  with  severe  fre¬ 
quency  attenuation.  A  major  problem  with  the  DFE,  however, 
is  the  error-propagation.  On  the  other  hand,  while  MLSE 
is  the  most  powerful  technique,  it  is  also  the  most  complex 
to  implement.  Recently,  a  number  of  schemes  —  generally 
known  as  reduced-state  sequence  estimation  (RSSE)  J —  were 
proposed  [1,  2,  3]  in  an  attempt  to  approach  the  performance 
of  the  MLSE  at  reduced  complexity.  Both  [l]  and  [2,  3]  are 
based  on  the  idea  of  pruning  the  MLSE  trellis,  namely  con¬ 
structing  only  a  small  subset  of  all  the  paths  in  the  trellis,  and 
then  selecting  the  most  likely  of  these  paths  as  the  estimated 
sequence.  The  proposed  scheme  would  be  a  further  step  in 
this  direction,  with  the  following  two  major  differences.  First, 
the  path  generation  mechanism  of  RSSE  schemes  is  controlled 
by  some  a  prior*  determined  rule.  In  contrast,  we  propose  to 
generate  the  subset  of  paths  in  the  trellis  in  accordance  with 
the  actual  noise  samples  in  the  channel.  Since  with  this  ap¬ 
proach,  additional  complexity  is  introduced  only  where  it  is 
needed,  one  would  have  to  consider,  on  the  average,  very  few 
paths.  Second,  we  propose  to  significantly  improve  upon  the 
performance  of  both  RSSE  and  MLSE  by  integrating  a  simple 
high-rate  error-detection  code  into  the  receiver  structure. 

II.  The  Proposed  Scheme 

The  following  is  a  simplified  overview  of  the  general  ideas  un¬ 
derlying  the  proposed  scheme.  The  source  data  stream  is  par¬ 
titioned  into  blocks  of  k  symbols,  which  are  subsequently  en¬ 
coded  into  the  codewords  of  a  cyclic  code  of  length  n.  Let 
at  denote  the  transmitted  symbols,  vt  the  noise  samples,  and 
yt  =  ^^0  ckt-ihi  +  vt  the  output  sequence  of  an  ideal  zero- 
forcing  feed-forward  equalizer  (FFE),  where  {ht}fto  stands 


for  the  (postcursor)  channel  impulse  response.  Then  the  con¬ 
ventional  DFE  operates  as  follows: 

M 

zt  =  at  - h  ^P(at_i  —  at-i)hi  +  vt>  (1) 

t-i 

where  zt  is  the  signal  at  the  slicer  input,  and  at  denotes  the 
estimated  symbol.  We  shall  refer  to  the  sequence  {at}  as  the 
standard  path.  Note  that  at  each  time  instance  the  channel 
noise  may  be  estimated  as  Vt  =  Zt  —  at*  The  basic  idea  is  to 
diverge  from  the  standard  path,  i.e.  open  a  new  path  in  the 
trellis,  only  when  the  estimated  noise  value  Vt  is  large.  The 
same  principle  may  then  be  employed  for  branching  from  each 
of  the  paths  that  are  already  followed. 

Once  all  the  paths  have  been  generated  as  described  above, 
they  are  processed  in  some  fixed  order  and  the  first  one  that 
happens  to  be  in  the  code  is  selected  as  the  estimated  se¬ 
quence.  Note  that  the  total  number  of  paths  to  be  considered 
could  still  be  quite  large.  However,  we  employ  the  structure 
of  cyclic  codes  to  implement  the  selection  process  with  very 
low  computational  effort. 

III.  Performance  Analysis 

In  order  to  analyze  the  performance,  we  introduce  an  approxi¬ 
mate  mathematical  model,  which  takes  into  account  the  error 
propagation  using  a  Gilbert-Elliot  channel  model.  That  is, 
we  assume  that  the  signed  Zt  in  (1)  can  be  described  by  a 
two-state  Markov  process,  where  one  of  the  states  is  error- 
free  while  the  other  is  the  error-propagation  state.  Based  on 
this  model  upper  and  lower  bounds  on  the  overall  probabil¬ 
ity  of  error  are  derived.  These  show  that  with  the  proposed 
method,  the  probability  of  error  can  be  made  several  orders 
of  magnitude  lower  than  that  obtained  with  the  conventional 
DFE.  In  addition,  comprehensive  computer  simulations  have 
been  performed.  The  simulations  concur  with  the  theoretical 
analysis,  indicating  a  significant  improvement  over  the  conven¬ 
tional  DFE.  More  specifically,  simulation  results  for  the  HDSL 
channel  test-loop  #4,  which  is  considered  to  be  a  difficult  test 
channel  with  a  considerable  amount  of  ISI,  show  a  reduction  of 
three  to  four  orders  of  magnitude  in  the  overall  system  BER, 
which  converts  into  savings  of  some  2.5  dB.  For  other  HDSL 
channels,  power  savings  of  up  to  3dB  were  achieved. 
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Abstract  —  This  work  aims  at  providing  near- 
optimal  and  sub- optimal  receiver  designs  for  digi¬ 
tal  communications  in  the  presence  of  Non-Gaussian 
noise  and  intersymbol  interference  (ISI).  Potential  ap¬ 
plications  include  wireless  indoor  (office  or  factory 
floor)  communications  (e.g.,  [1])  which  are  charac¬ 
terized  by  ISI  due  to  multipath  fading  and  limited 
channel  bandwidth  and  by  non-Gaussian  background 
noise. 

In  our  problem  the  received  signal,  corrupted  by  ISI  and  ad¬ 
ditive  non-Gaussian  noise  is  given  as  r(t)  =  X^m=-oo  anih{t  — 
mT)  +  n(t)  where  am  =  ±1  (binary  signaling)  and  h(t)  repre¬ 
sents  the  impulse  response  of  the  channel  We  have  to  deter¬ 
mine  a  2J  bit  sequence  of  transmitted  bits  a_j  . . .  aj-i  from 
the  received  wave-form  over  the  observation  interval.  The 

problem  of  sequence  estimation  is  posed  as  a  M-ary  hypothe¬ 
sis  testing  problem :  Vi,  1  <  i  <  M  =  22J,  Hi  corresponds  to 

the  fact  that  the  sequence  Ai  was  sent,  i.e.: 
j~  i 

Hi  :  r(t)  =  Gm  h(t  -  rriT)  +  n(t)  —  x(Ai ,  t)  +  n(t) 

m—~J 

r(t)  is  sampled  with  the  sampling  rate  j?  =  where  L  de¬ 
notes  the  number  of  samples  over  a  single  bit  interval.  Thus, 
we  have:  tk  =  kT';r(tk)  =  x(tk)  H-n(tfc)  =  xk  with  xk  — 
TJrn  Q>mh{tk  —  mT).  Then,  we  can  form  a  discrete  representa¬ 
tion  of  the  form:  Rk  ~  Xk  +  JV*,  where  Rk  -  [n,  r2> . . .  ,  rP]T, 
Xk  =  [xi ,...,a:P]T,  Nk  =  [ni, . . .  nP]T.  P  is  chosen  such 
that  all  the  bits  creating  ISI  for  the  sequence  considered  are 
observed.  If  we  assume  that  the  impulse  response  h(t)  be¬ 
comes  zero  for  t  >  NT  and  t  <  - NT ,  that  is  the  ISI  is 
assumed  only  over  K  =  2 N  -f*  1  adjacent  bit  sequences,  then 
we  have:  P  =  2 (J  -f  N)L  We  consider  sufficiently  long  se¬ 
quences  that  P  >>  jV.  The  decision  rule  which  minimizes 
the  probability  of  error  is:  Choose  the  sequence  Ai  =  { a if 

Pr/Hi(R/Hi)>PT/Mj(R/Hj)  v»#y. 

Under  very  low  SNR  and  i.i.d  conditions,  it  can  be  shown 
that 

p 

PrlHi(R/Hi)  =  [J29(rk)xk{Ai))Pr/Ho(R/Ho) 

k  =  i 

where  Ho  :  r(t)  =  n(t ),  g(rk)  =  ^  In  P(rk/Ho)  = 
4:  *k(Ai)  =  yJy~-j  a™h(tk  -  mT).  Thus,  a 

sufficient  statistic  for  detection  is  Ai  ~  9(?k)xk{Ai).  It 

should  be  emphasized  that  the  problem  treated  here  is  that 

XC.  Cordier  is  with  the  Ecole  Nationale  Superiere  des  Telecom¬ 
munications,  France;  he  held  an  internship  with  the  ISR  at  the 
University  of  Maryland  during  the  summer  and  Fall  of  1994 


of  coherent  reception  and  that  the  knowledge  of  the  impulse 
response  of  the  channel  is  required  to  compute  Ai. 

In  case  of  correlated  noise  samples,  the  maximization  of 
Hr / Hi(R/ Hi)  can  be  replaced  by  a  M-ary  classification  prob¬ 
lem  involving  binary  hypothesis  testing  and  pairwise  likeli¬ 
hood  ratios.  For  Lij  =  ,  the  decision  process  is 

f  **  compute  Li,j  i  ^  j  i,  j  =  1,  2, . . .  M 

\  -  decide  Hi  if  Vj  #  i  LtjJ  >  j]ifj. 

Since  the  computation  of  LiyJ  is  intractable  in  a  non- 
Gaussian  environment,  suitable  approximations  have  to  be 
employed.  Indeed,  two  approaches  are  followed:  (i)  The  Gen¬ 
eralized  Correlator  (GC),  as  in  the  iid  case.  Here  we  extend 
the  work  of  [2];  low  SNR  conditions  and  large  sample  sizes  are 
assumed.  And  (ii)  the  Linear  Quadratic  Detector  (LQD)  of 
[3],  which  can  be  designed  to  match  Litj  under  any  SNR  con¬ 
ditions  and  without  having  to  resort  to  large  noise  samples. 
The  generalized  likelihood  ratio  is  used  here  rather  than  the 
deflection.  Only  the  knowledge  of  1st  to  4th  order  statistics  of 
the  observations,  under  both  hypotheses,  is  required. 

We  derive  the  appropriate  GC  to  fit  Ltj.  Each  likelihood 
ratio  is  then  approximated  by  a  statistic  of  the  form: 
p  p 

Ti/i  =  X.(Xk(Ai)  -  xk(Aj))gi/i(rk)  =  y 1sk/3)9i/j(rk) 

*= 1  k=l 

and  the  discrimination  test  between  Hi  and  Hj  becomes: 
Hi/j  ><h  Vi/j •  This  memoryless  discriminator  is  character¬ 
ized  by  the  non-linearity  glfj  and  the  threshold  7]z/j .  When  all 
the  non-linearities  gi/j  are  given,  the  corresponding  thresholds 
can  be  determined  so  that  the  different  tests  form  an  appro¬ 
priate  partition  of  the  observation  space.  Each  nonlinearity 
is  selected  by  maximizing  the  appropraite  efficacy  functional 
and  solving  the  resulting  integral  equation  numerically.  How¬ 
ever,  we  also  need  here  an  estimate  of  the  impulse  response 
of  the  multipath  channel.  Actually  for  we  need  the  sam¬ 
ple  power  and  sample  autocorrelation  functions  of  the  signal 
components  under  the  sequences  i  and  j  and  the  marginal  and 
bivariate  pdfs  of  the  background  noise.  The  latter  noise  distri¬ 
butions  can  be  obtained  via  histograms  or  Kernel  estimation; 
noise  estimation  can  be  done  on-line  as  long  as  the  signal  level 
remains  of  sufficiently  low  SNR.  On  the  other  hand,  the  chan¬ 
nel  impulse  response  can  be  estimated  by  filtering  out  the 
background  noise  during  the  training  stage. 
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Abstract  —  The  paper  deals  with  the  synthesis  of 
an  asymptotically  optimum  diversity  detector  for  the 
incoherent  detection  of  a  bandpass  signal  subject  to 
slow  and  nonselective  fading  and  embedded  in  corre¬ 
lated  spherically  invariant  noise. 

Summary 

The  detection  problem  under  consideration  can  be  repre¬ 
sented  by  the  hypothesis  test 

Hq  :  Vp  —  tip , 

P  — 

H i  :  f„  =  -^Ape^w  +  Up, 

VN 

where  rv  and  fip  are  N-dimensional  row  vectors  whose  com¬ 
ponents  are  samples  drawn  from  the  complex  envelopes  of  the 
received  signal  and  the  noise  (respectively)  on  the  pth  diver¬ 
sity  branch.  The  vector  v  represents  the  vector  of  the  samples 
drawn  from  the  complex  envelope  of  the  bandpass  signal  to 
be  detected.  The  random  variable  (RV)  Av ,  which  assumes 
nonnegative  real  values,  accounts  for  the  presence  of  a  slow 
amplitude  fading  on  the  pth  channel.  The  RV  9P  is  assumed 
to  be  uniformly  distributed  over  a  2x  interval  (incoherent  de¬ 
tection).  The  RV’s  Ap  and  0P ,  and  the  noise  vector  fip  are  mu¬ 
tually  independent  on  each  channel.  Furthermore,  amplitude 
fadings,  phases,  and  noises  on  the  different  diversity  channels 
are  mutually  independent.  Finally,  the  signal  amplitude  is  as¬ 
sumed  to  decrease  as  7 / y/N  (with  7  a  positive  constant)  so 
that  the  signal-to-noise  ratio  (SNR)  is  finite  and  not  zero  for 
any  value  of  N. 

The  assumed  spherically  invariant  (SI)  noise  model  allows 
one  [1,2]  to  write  np  =  apgp1  where  ap  is  a  nonnegative  RV 
independent  of  gp>  which  is  a  zero-mean  complex  Gaussian 
vector  characterized  by  a  2N  x  2N  correlation  matrix  cr2gpKp 
with  <j2gv  the  common  variance  of  the  inphase  and  quadrature 
components. 

The  matrices  Kv  (p  =  1,2,...,L)  admit  the  Cholesky  de¬ 
composition  Kp  ~  CpCpy  where  T  denotes  transpose  opera¬ 
tion  and  Cp  are  2N  x  2N  invertible  lower  triangular  matrices. 
Therefore,  the  theorem  of  reversibility  and  the  closure  prop¬ 
erty  of  the  SI  vectors  under  deterministic  linear  transforma¬ 
tions  [1]  assure  that  the  detector  synthesized  on  the  basis  of 
the  hypothesis  test 

Ho  1  Xp  —  fbp ) 

p  =  l,2,...,L,  (2) 

retains  the  optimality  properties  of  the  detector  synthesized 
starting  from  (l).  In  (2),  wp  =  wpc  +  jwpi  is  a  white  SI  vector 
with  modulating  RV  ap,  which  is  obtained  by  the  transforma- 
tion  (wpc ,wp3)  =  (npc,npi)(C~1)T .  Moreover,  ( xpc,xp .)  = 
(l'pc, rr>)((-’p  1)T  and  (spo  sr>)  —  (v£,  v,)(Cp  )  ■ 


The  asymptotically  optimum  (AO)  detector  can  be  synthe¬ 
sized  starting  from  an  asymptotic  expression  of  the  likelihood 
ratio  on  the  pth  channel  conditioned  to  Ap  and  Op ,  which  can 
be  derived  following  an  approach  similar  to  that  considered  in 
[3],  The  resulting  decision  statistic  for  the  AO  detector  is 


where  Eav[-]  denotes  the  statistical  expectation  with  respect 
to  Ap ,  Pp  A  ||  Sp  ||2  /N  provides  a  measure  of  the  signal 
power  on  the  pth  channel,  ||  •  |j  denotes  Euclidean  norm,  io(*) 
is  the  modified  Bessel  function  of  the  first  kind  and  zero  order, 
and  *  denotes  complex  conjugation. 

If  one  assumes  that  the  fading  RV’s  Av  are  Rayleigh  dis¬ 
tributed,  it  results  that 


L  Ray 


(*)  =  £' 


P=  1 
L 


-£ln 


I2 

p  II2  [II  ip  II2  WEAt{aDnpA 

,  7 2EAr(A2p)NPp] 

+  II  ^  IP 


(4) 


The  main  advantage  of  the  proposed  AO  detector  is  that  its 
structure  does  not  depend  on  the  univariate  probability  den¬ 
sity  functions  (PDF’s)  of  the  noises  on  the  diversity  channels. 
The  synthesis  of  the  detection  structure,  however,  requires 
a  priori  knowledge  of  the  noise  correlation  matrices  and  the 
fading  PDF’s.  Then,  its  complexity  is  just  that  of  the  fully  op¬ 
timum  detector  for  a  correlated  Gaussian  noise  environment. 


The  detection  probability  and  the  false-alarm  rate  of  the 
proposed  AO  detector  in  white  SI  noise  depend  on  the  signal 
to  be  detected  only  through  the  mean  (over  fading)  SNR’s  on 
the  diversity  branches,  resulting  so  unaffected  by  the  signal 
shape.  Consequently,  the  closure  property  of  the  SI  vectors 
under  deterministic  linear  transformations  assures  that  the 
performance  in  correlated  noise  can  be  easily  assessed  by  ex¬ 
ploiting  the  relationship  between  the  mean  SNR’s  at  the  input 
and  the  output  of  the  whitening  filters. 
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Abstract  —  New  upper  and  lower  bounds  to  the 
mean  recovery  time  of  decision  feedback  equalization 
(DFE)  are  derived.  The  recovery  time  is  defined 
as  the  time  it  takes  the  decision  feedback  equalizer 
(DFEQ)  to  reach  the  error  free  state  after  an  error 
has  corrupted  an  error  free  DFEQ.  The  derivations 
of  the  bounds  assume  a  causal  channel  response,  in¬ 
dependent  data  symbols,  and  independent  noise  sam¬ 
ples.  The  bounds  are  found  to  be  tighter,  especially 
at  large  SNR,  than  previous  bounds  in  a  numerical 
example. 

Intersymbol  interference  (ISI)  in  a  communication  system 
has  a  deleterious  effect  on  system  performance.  The  ISI  arises 
because  insufficient  channel  bandwidth  causes  the  pulses  to 
spread  into  adjacent  pulse  intervals  at  the  receiver  end.  This 
spreading  may  increase  or  decrease  the  noise  margin  of  the  re¬ 
ceived  signal  depending  on  the  relative  polarities  of  the  pulses. 
On  the  average,  however,  ISI  increases  the  bit  error  probabil¬ 
ity. 

One  of  the  methods  often  used  to  combat  the  effects  of  ISI 
is  to  use  DFE.  A  DFEQ  operates  by  reconstructing  the  por¬ 
tion  of  the  ISI  due  to  previously  transmitted  symbols  and  then 
subtracting  out  this  portion  from  the  received  signal.  The  re¬ 
construction  is  based  on  estimating  the  previously  transmitted 
symbols  and  the  channel  characteristics. 

Assuming  that  the  past  decisions  are  correct,  a  DFEQ  (with 
perfect  channel  identification)  can  eliminate  ISI  due  to  previ¬ 
ously  transmitted  symbols  in  the  span  of  the  feedback  filter 
completely.  However,  decision  errors  will  result  in  residual  ISI 
which  may  increase  the  probability  of  decision  error  in  the  fu¬ 
ture  detected  symbols.  This  leads  to  error  propagation  in  the 
DFEQ.  Analysis  of  a  DFEQ  is  difficult  because  little  is  known 
about  the  distribution  of  the  past  decision  errors. 

It  is  important  to  know  how  fast  a  DFEQ  can  recover  from 
an  error;  that  is,  how  many  symbol  intervals  it  takes  to  clear 
up  an  initial  error  introduced  into  the  feedback  filter.  Then 
one  knows  how  many  future  decisions  will  be  affected  by  the 
error.  When  the  DFEQ  has  a  finite  number  of  taps  in  the 
feedback  filter  and  the  system  response  has  a  finite  time  dura¬ 
tion,  the  communications  system  can  be  modelled  as  a  finite 
state  Markov  chain  as  shown  by  Monsen  [1]  and  Austin  [2]. 
Austin,  in  [2],  showed  how  to  obtain  the  mean  recovery  time 
exactly  through  quasi-simulations  and  discussed  bounding  the 
mean  recovery  time.  However,  both  of  Austin’s  approaches  re¬ 
quire  computational  efforts  that  grow  exponentially  with  the 
length  of  the  DFEQ.  The  mean  recovery  time  of  a  DFEQ  with 
error  state  transition  probabilities  of  1/2  was  also  computed 
in  [2].  Cantoni  and  Butler  [3]  derived  an  upper  bound  for 
the  mean  number  of  symbols  required  to  reach  the  zero  error 
state,  starting  from  an  arbitrary  initial  state  and  subject  to 
noise.  The  bound  depends  only  on  the  number  of  taps  in  the 
DFE  feedback  filter  and  the  number  of  signal  levels.  Kennedy 
and  Anderson  [4]  extended,  generalized,  and  clarified  the  con¬ 
tributions  in  [3],  and  gave  a  class  of  channels  for  which  the 


upper  bound  in  [3]  is  exactly  the  mean  recovery  time. 

Duttweiler,  Mazo  and  Messerschmitt  in  [5]  developed  an 
aggregated  states  model  of  a  DFEQ  which  was  used  to  upper 
bound  the  average  error  probability.  Beaulieu  [6]  modified  the 
model  in  [5]  to  compute  upper  and  lower  bounds  for  the  mean 
recovery  time  by  writing  difference  equations  for  conditional, 
state  dependent,  mean  recovery  times.  He  also  provided  an¬ 
alytical  proofs  of  some  known  results  that  previously  were 
justified  with  intuitive  arguments.  Altekar  and  Beaulieu  de¬ 
veloped  models  in  [7]  that  lead  to  tighter  upper  bounds  on  the 
average  probability  of  error  of  a  DFEQ  than  those  of  [5], 

In  this  paper,  new,  tighter  bounds  on  the  recovery  times  of 
DFE  are  derived  by  modifying  the  models  of  [7]  used  for  error 
probability  upper  bounds.  The  channel  is  modelled  as  a  linear, 
shift-invariant,  discrete-time  filter.  Using  appropriate  choices 
for  defining  states,  a  number  of  aggregated  states  models  of 
the  DFEQ  can  be  constructed.  Good  choices  for  state  models 
lead  to  improved  bounds  on  recovery  time  statistics.  Three 
models  are  constructed  here,  each  of  which  leads  to  new  and 
tighter  bounds.  A  single  errors  model,  a  double  consecutive 
errors  model  and  an  arbitrary  double  errors  model  are  defined 
and  used  to  derive  bounds  on  recovery  time  statistics. 

For  the  numerical  example  considered,  the  arbitrary  double 
errors  model  gives  the  tightest  bounds  for  the  mean  recovery 
time.  At  small  SNR  values,  the  new  bounds  from  the  three 
models  and  previous  bounds  coincide.  At  large  SNR  values, 
the  new  bounds  are  much  tighter  than  previous  ones.  In  par¬ 
ticular,  the  new  bounds  from  the  three  models  all  approach 
A,  the  length  of  the  DFEQ,  at  large  SNR. 
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Abstract  —  A  new  lower  bound  on  the  undetected 
error  probability  of  block  codes  is  presented  and  codes 
that  meet  this  lower  bound  are  characterized. 

I.  Introduction 

Let  C  be  an  (n,M)q  code,  i.e.,  C  is  a  q- ary  code  of  length  n 
and  size  M.  We  assume  that  each  codeword  is  transmitted 
with  probability  1/M  and  that  each  letter  is  equally  likely 
to  suffer  from  an  error  that  changes  it  into  any  one  of  the 
other  q  —  1  letters  with  probability  c/(q  —  1)  independently  of 
other  letters.  We  assume  in  the  following  that  e  <  (q  —  1  )/g, 
i.e.,  the  probability  that  a  letter  is  received  correctly  is  at 
least  equal  to  the  probability  that  it  is  received  as  any  given 
erroneous  letter.  For  w  —  0, 1, . . . ,  n,  let  Aw( c)  be  the  number 
of  codewords  at  distance  w  from  the  codeword  c  and  AW(C)  = 
Y^cec  Aty(c)/M.  The  undetected  error  probability  of  the  code 
C  is  given  by 

pud(c,o  =  X>4C)(-L t)  U-O"-” 

w—1  '  ' 

In  this  paper,  we  derive  a  lower  bound  on  the  undetected  error 
probability  for  linear  and  nonlinear  block  codes  and  present 
codes  that  meet  this  lower  bound.  In  particular,  these  codes 
are  optimal  for  error  detection. 

II.  Lower  Bound 

The  following  theorem  gives  a  lower  bound  on  Pnd{C,  e)  for 
any  (n,  M)q  code  C. 

Theorem  1  Let  C  be  an  (n,  M)q  code  and  0  <  e  <  (q  —  1)/ q . 
Then ,  the  undetected  error  probability  of  the  code  C  satisfies 
the  bound 


Wolf,  Michelson,  and  Levesque  derived  a  lower  bound  on 
Pud{C,  e)  for  any  binary  linear  code  C  of  length  n  and  dimen¬ 
sion  k  [2].  This  bound  has  been  recently  generalized  by  Klove 
[1]  to  linear  codes  over  arbitrary  finite  fields  of  size  q.  The 
new  bound  described  in  this  paper  is  not  only  more  general 
than  the  Klove- Wolf- Michelson- Levesque  (KWML)  bound,  in 
the  sense  that  it  holds  for  linear  and  nonlinear  codes  while  the 
KWML  bound  holds  only  for  linear  codes,  but  it  is  also  tighter. 
In  fact,  the  new  lower  bound  equals  the  KWML  bound  only 
if  k  =  n  -  1,  k  =  n,  e  =  0,  or  e  =  (g  —  1  )/q.  In  all  other  cases, 
the  new  lower  bound  is  larger  than  the  KWML  bound. 

1This  work  was  supported  in  part  by  NSF  under  grant  NCR 
91-15423. 


III.  Strictly  Optimal  Codes 

We  say  that  a  code  C  is  strictly  optimal  if  its  undetected  error 
probability  equals  the  lower  bound  stated  in  Theorem  1  for  all 
0  <  e  <  {q  -  1)/?.  The  following  result  gives  a  combinatorial 
characterization  of  strictly  optimal  codes. 

Theorem  2  An  ( n ,  M)q  code  C  is  strictly  optimal  if  and  only 
if  C  contains  at  least  [M/g5J  and  at  most  \M/q^~\  codewords 
that  agree  on  any  given  s  indices ,  where  s  =  1, . . . ,  n. 

Hence,  a  necessary  condition  for  a  code  to  be  strictly  optimal  is 
that  its  Hamming  distance  is  n  —  [logq  M]  -f 1.  The  following 
result  shows  that  this  condition  is  also  sufficient  if  M  is  an 
integer  power  of  q.  In  this  case,  an  (n,M)q  code  C  is  called 
maximum  distance  separable  (MDS)  if  its  Hamming  distance 
equals  n  —  logq  M  +  1. 

Theorem  3  If  M  is  an  integer  power  of  q,  then  an  (n,  M)q 
code  is  strictly  optimal  if  and  only  if  it  is  MDS.  In  particular, 
an  (n,  M)q  linear  code  is  strictly  optimal  if  and  only  if  it  is 
MDS . 

If  M  is  not  an  integer  power  of  q ,  then  an  ( n,M)q  code  of 
Hamming  distance  n  —  [logq  M]  +  1  may  not  be  strictly  opti¬ 
mal.  The  following  result  determines  necessary  and  sufficient 
conditions  for  the  existence  of  strictly  optimal  binary  codes. 

Theorem  4  A  strictly  optimal  (n,  M) 2  code,  where  n  and  M 
are  positive  integers  and  M  <  2n ,  exists  if  and  only  if  one  of 
the  following  conditions  holds: 

•  n  G  {1,  2,  3}. 

•  n  =  4  and  M  0  {3,4,  12, 13). 

•  n  >  5  is  odd  and  M  G  {1,2,  (2n-2)/3,  (2n  +  l)/3,  271"1  - 
1,  2n“1 , 2n”x  +  1,  (2n+1  -  l)/3,  (2n+1  +  2)/3,  2n  -  2,  2n  - 
1,2”}. 

.  n  >  6  is  even  and  M  G  {l,2,(2n  -  l)/3,(2n  + 
2)/3,  2n~1  —  2,  2n-1  —  1,  2n~\ 2”"1  +  1,  2”"1  +2,  (2n+1  - 
2)/3,(2n+1  +  l)/3,2n  -2,2”  -l,2n}. 

As  an  application  in  which  M  is  not  an  integer  power  of  q ,  we 
consider  binary-coded-decimal  codes  where  q  =  2  and  M  = 
10.  It  is  interesting  to  note  that  the  widely  known  2-out- 
of-5  code,  consisting  of  the  ten  binary  sequences  of  length 
n  =  5  with  exactly  two  ones  is  not  strictly  optimal.  On  the 
other  hand,  Theorem  4  indicates  the  existence  of  a  (5,10)2 
strictly  optimal  code.  Indeed,  the  code  consisting  of  all  binary 
sequences  with  exactly  one  or  four  ones  is  strictly  optimal. 
This  l-or-4-out-of-5  code  has  undetected  error  probability  of 
4e2  _  ge3  +4e4  4-  while  the  undetected  error  probability  of 
the  2-out-of-5  code  is  6e2  —  18c3  +  21c4  —  9c5. 
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Abstract  —  The  worst-case  probability  of  undetected 
error  for  a  linear  [n,  k\  q ]  code  used  on  a  local  binomial 
channel  is  studied.  For  the  two  most  important  cases 
it  is  determined  in  terms  of  the  weight  hierarchy  of 
the  code.  The  worst-case  probability  of  undetected 
error  for  simplex  codes  is  determined  explicitly.  A 
conjecture  about  Hamming  codes  is  given. 


We  consider  a  couple  of  particular  classes  of  codes. 

The  first  class  of  codes  we  consider  is  the  binary  simplex 
codes.  For  each  m  >  1  there  is  a  binary  simplex  code  Sm 
with  parameters  n  =  2m  -  1,  k  =  m,  dr  =  2m  -  2m“r  for 
1  <  r  <  m. 

Theorem  4  For  m>3,  let 


I.  Background 

The  local  binomial  channel  was  defined  implicitly  by  Ko- 
rzhik  and  Fink  [2,  page  193]  and  explicitly  by  Korzhik  and 
Dzubanov  [1].  It  is  a  channel  which  is  a  g-ary  symmetric  chan¬ 
nel  for  each  transmitted  symbol,  but  the  symbol  error  proba¬ 
bility  may  vary  from  one  transmitted  symbol  to  the  next. 

Let  Pne(C,p)  =  Pue(Cr,Pi,p2,  • .  • ,Pn)  denote  the  probabil¬ 
ity  of  undetected  error  when  a  codeword  from  a  linear  [n,  fc;  g] 
code  C  is  transmitted  over  a  local  binomial  channel  with  sym¬ 
bol  error  probability  p{  for  V th  transmitted  symbol.  Let  the 
worst- case  error  probability  be  defined  by 

P-wc(C,v)  =  max | Pue(C', p)  I  0  <  pi  <  v  for  1  <  i 
The  support  of  a  vector  c  is  given  by 

X(c)  =  {i  |  a  /  0}. 

For  a  vector  c  =  (ci,  c2j . . . , cn)  and  a  set  X  = 
where  1  <  i\  <  i2  <  ■  •  •  <  ir  <  n,  we  let 

CX  (Cij  ,  C{2  , . . .  ,  Cir  ) . 

For  an  [n,  k ;  q]  code  C  and  a  set  X  as  above,  we  define 
Cx  =  { cx  |  c  €  C  and  x(c)  C  X }. 

We  use  the  notation  Puse(C,p)  =  Pue(C,p,p, ...  ,p)  for  the 
probability  of  undetected  error  when  C  is  used  on  a  g-ary 
symmetric  channel  with  error  probability  p. 


II.  New  results 

Theorem  1  Let  C  be  an  [n,  fc;  q]  code.  Then 

Pwc(C,v)  =  max  jPue(Cx,u)  |  X  C  {1, 2, . . . ,  n}j. 
Theorem  2  Let  C  be  an  [n,k,d;q]  code.  Then 


Pwc(C,l) 


1 

(q-iy-1' 


Theorem  3  Let  C  be  an  [n,  k,  d ;  g]  code.  Let 


1  <r<k  and  dr  =  d\  +  (r  • 


voim)  =  1  —  (2m  —  l)_1/(2m 

Then 

Pwc(5m,u)  =  (2m  -  l)v2Tn~l(l  -  v)2^1-1 

for  0  <  v  <  vo  (m)  and 

Pwc(Pm,  v)  ~  v2 

for  vq  (m)  <  v  <  1. 

A  similar  theorem  is  true  for  the  first  order  Reed-Muller  codes. 

The  binary  Hamming  codes  Hm,  where  m  >  1,  have  pa¬ 
rameters  n  —  2m  —  1,  A:  =  2m  —  1  —  m,  d  =  3.  We  conjecture 
that  the  following  result  is  true  for  all  m  (it  is  true  for  m  <  4). 

Conjecture  1  Define  gr{v)  for  r  >  2  by 

9r(v)  =L(l  +  (2'~  1)(1  -  2V)2r_1)  -  (1  -  vf-\ 

Let  v\  =  1,  and  for  r  >  2  let  vr  be  the  root  of  the  equation 
9r{v)  ~  gr+ i(v)  in  the  interval  (0, 1). 

Then  Vi  >  v2  >  vs  > 

P\vc(F-m  j  0,  u)  =  gm{v ) 
for  0  <  v  <  Vm-i,  and 

Pwc  (Hm ,  0,  v)  =  gr  ( v ) 

for  vr  <  v  <  Vr  —  i  and  r  =  2,3,4, ...  ,m  —  1. 
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where  di,  cfe, . . .  ,dk  is  the  weight  hierarchy  of  C .  Then 


Pwc(C,(g-l)/g)  = 


qs~  1 

qd-\-s— x 
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Abstract  —  A  q- nary  (n,fc)  linear  code  is  said  to  be 
proper  if,  as  an  error-detection  code,  the  probabil¬ 
ity  of  undetectable  error,  Pudy  satisfies  Pud  5;  Q  ^  ^ 

for  completely  symmetric  channels.  In  this  paper,  we 
show  that  a  proper  code,  as  an  error-correction  code, 
satisfies  the  expurgated  bound  on  the  decoding  error 
probability  for  a  class  of  channels  with  the  associated 
Bhattacharyya  distance  begin  completely  symmetric. 
Known  results  on  the  undetectable  error  probability 
then  immediately  imply  that  the  expurgated  expo¬ 
nent  is  satisfied  by  many  codes  which  are  regarded  as 
good  codes. 


let  i  s 

r(a')  y/PmP(bW) 

P(a\a)  =  - f -  -is 

Ea"  d«")  [Ef,  VP(fc|a)P(6|a")j 

be  the  channel  induced  from  P,  then  we  have 
P/(xt)  <  exp{-nsP*(l/s,r)  +  nlogg}  X 

x  ^exp  <  Vi  j  (a,  a)  log  P(af\a)  >  ,  (3) 

j^i  V  a ,  n '  ' 

where  the  expurgated  exponent  is 


I.  Introduction 

Random  coding  arguments  tell  that  the  most  of  codes  satisfy 
the  random  coding  bound  and  that  the  most  of  expurgated 
codes  satisfy  the  expurgated  bound  asymptotically,  but  the 
most  of  time  we  can  not  tell  if  a  specific  code  satisfies  such 
bound.  However,  we  can  show  that  proper  codes  or  asymp¬ 
totically  proper  code  satisfy  or  asymptotically  satisfy  the  ex¬ 
purgated  bound. 

Before  the  works  of  Leung- Yan-Cheong  et  al.[l,  2],  it  had 
bee  believed  that  the  probability  of  undetectable  error  was 
upper  bounded  by  q~(n~k )  whenever  a  g-nary  (n,fc)  linear  code 
was  used  for  error  detection  over  a  g-nary  symmetric  channel. 
They  showed  some  examples  of  codes  which  do  not  satisfy 
this  bound,  and  called  a  code  which  satisfies  g~(n_*d  bound 
a  proper  code.  Subsequent  works  suggest  that  proper  codes 
are  also  good  as  error- correct  ion  codes.  In  fact,  it  is  shown 
that  proper  codes  satisfy  the  asymptotic  Gilbert- Varshamov 
bound  on  the  minimu  distance[3].  In  this  paper,  we  show  that 
proper  codes  satisfy  the  expurgated  bound. 


II.  Error  probabilities 

If  we  use  c  as  an  error-detection  code  for  a  DMC  Q,  then  the 
undetectable  error  probability  when  x,  €  c  is  sent  is  written 


where  Vi,j(a,  a)  is  the  joint  type  of  (a,  a)  in  (x;,Xj). 

On  the  other  hand,  if  we  use  c  as  an  error-correction  code 
for  another  DMC  P,  then,  from  known  arguments  for  the  proof 
of  the  expurgated  bound,  we  have  a  bound 


’(xo  < 


exp 


nYlog 


(2) 


where  s  is  any  non-negative  number. 

If  we  compare  bounds  (l)and  (2),  then  we  can  notice  some 
similarity.  In  fact,  for  a  probability  mass  function  r(a),  if  we 


ECI(R)  =  J2 

P>  1 


max  Ex(p,p)  —  pR 
v 


and  the  optimal  p  is  used  for  r.  Now,  the  relationship  between 

(1)  and  (3)  is  obvious. 

We  can  show  the  following  theorem: 

Theorem  1  For  a  given  DMC  P ,  suppose  that  P  is  com¬ 
pletely  symmetric .  Then,  the  expurgated  bound 


Pe  <  exp{—  nEex{R)} 


holds  for  all  proper  linear  codes. 


III.  Concluding  remark 

Up  to  now,  many  codes  are  shown  to  be  proper,  and  the  above 
theorem  then  implies  that  those  codes  satisfy  the  expuraged 
bound.  Unfortunately,  the  expurgated  bound  is  greater  than 
the  known  error  bound  for  some  codes  such  as  the  simplex 
code.  Thus,  our  result  does  not  necessarily  solve  all  the  pro- 
belm.  However,  if  we  note  that  the  error  bound  is  not  known 
for  the  most  of  practical  codes,  our  result  gives  an  usefull  tool 
to  obtain  the  first  approximation  on  the  error  probability. 
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Abstract  —  We  estimate  the  range  where  the  dis¬ 
tance  distribution  of  a  code  can  be  approximated  by 
the  binomial  distribution. 

I.  Introduction 

The  binomial  distribution  is  a  well  known  approximation  to 
the  distance  spectra  of  many  classes  of  codes.  For  example,  it 
is  known  to  be  tight  for  the  weights  of  BCH  codes  with  fixed 
minimal  distance  and  of  growing  length.  In  general  the  range 
where  the  distance  distribution  is  close  to  the  binomial  de¬ 
pends  essentially  on  the  dual  distance.  In  the  talk  we  present 
new  bounds  [l,  2,  3,  4]  for  this  range  for  codes  with  the  dual 
distance  about  half  of  the  length  n  of  the  code,  and  for  codes 
with  the  dual  distance  growing  linearly  in  n. 

II.  BCH  CODES 

Let  the  distance  distribution  of  a  code  C  be  B  = 
( Bo , . . . ,  Bn),  and  B ['  =  ( Bf0 , . .  . ,  B'n)  stand  for  the  the  dual 
spectrum,  that  is  B'  is  determined  by  the  MacWilliams  trans¬ 
form. 

Theorem  1  In  the  extended  BCH  code  of  length  n  =  2m  and 
minimum  distance  2t  +  2  <  2^m+1^2  +  2, 

Bi  =  0  for  i  odd, 

Bz  =  ^  ^  n  *(H -  Ei)  for  i  even, 


Using  the  theorem  we  can  analyze  some  particular  cases. 

Corollary  1  If  t  —  o(^n),  and  i  grows  linearly  with  n,  i  — 
an,  then 

^ log2  I Ecn\  <  +  o(l). 

Corollary  2  If  t  =  o(n *),  i  =  o(y/n),  then 

\Ei\  <  V2ii/2  e2(t-1)2-*/2  n‘-‘/2  (i  +  o(i)). 

Now  we  show  that  the  binomial  approximation  can  not  be 
too  tight.  Define 

This  is  evidently  the  deviation  of  the  i-th  spectrum  element 
from  the  ”  expected”  value  given  by  the  binomial  distribution. 

Theorem  2  Let  B[  =  0,  for  i  6  [1,  d[  -  1]  U  [4  +  1,  n  -  1]. 
Then 


%  +  [- 


n 

d>- 1.  Td' 


■d*- 1,  ,  rd\, 

.“V”] + 1^]  + 1 


For  constant  t  this  estimate  turns  out  to  be  asymptotically 
tight  for  BCH  codes  with  distance  d  =  2t  +  1  <  y/n. 

Another  bound  is  a  corollary  of  the  Parseval  identity. 

Theorem  3 

_n,  2  \n\2  d'2  d/2 

Zj_  __  l£L  V  — 

2—*  (™\  2n  2^  CU  * 

i-o  i=d>  \ij 


III.  Codes  with  linearly  growing  dual  distance 

Using  an  approach  similar  to  linear  programming  we  get 
the  following  results. 

Theorem  4  For  j/n  £  (1/2  -  l/2yj8'(2  -  S'), 1/2  + 

l/2A/«'(2 -«')), 


Theorem  5  For  even  codes,  and  2j/n  £  (H — ^  ,  1 

0-2 s')2  \ 


log  B2j  =  log  +  0(log  n). 

Let  C  be  self-dual.  In  this  case  for  d  asymptotically  greater 
than  0.146447...  n  (if  such  codes  exist!)  we  can  guarantee  a 
wider  interval  of  binomiality. 

Theorem  6  If  there  exists  a  self-dual  code  of  length  n  with 
d  >  (1/2  —  \/2/4)n(l  +  o(l))  then 


for  (1/4  -  73/12 )n  <j<  (1/4  +  73/12 )n,  j  ±  n/2. 


4 3  ,  /27r(n  —  j)(n  —  2j)  (2") 
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One  of  the  first  results  one  meets  in  coding  theory  is  that  a  binary  linear 
[n,k,d\-c ode,  whose  minimum  weight  is  odd,  can  be  extended  to  an 
[«+l,M+l]-code. 

This  is  one  of  the  few  elementary  results  about  binary  codes  which  does  not 
obviously  generalize  to  q- ary  codes.  Although  one  can  readily  extend  a  g-ary 
code,  by  adding  a  further  check  digit,  it  is  not  clear  under  what  circumstances 
such  an  extension  will  increase  the  minimum  distance.  The  aim  of  this  paper  is 
to  give  a  simple  sufficient  condition  for  a  q-ary  [n,k,d\- code  to  be  extendable 
to  an  [rt+l,M+l]-code.  The  result  is  indeed  a  generalization  of  the  above 
result  for  binary  codes.  It  also  generalizes  a  result  for  ternary  codes  due  to  van 
Eupen  and  Lisonek  [2],  whose  proof  made  use  of  quadratic  forms.  Our 
generalization  has  an  elementary  proof. 

Theorem  1.  Let  C  be  an  [n,k,d]~ code  over  GF{q )  with  gcd(^<?)=l  and  with 
all  weights  congruent  to  0  or  d  (modulo  q).  Then  C  can  be  extended  to  an 
[tt+l,M+l]-code,  all  of  whose  weights  are  congruent  to  0  or  d+l  (modulo  q). 

Proof.  Suppose  x  andy  are  two  linearly  independent  vectors  of  length  n  over 
GF(q)  and  suppose  there  are  exactly  z  coordinate  positions  in  which  x  and  y 
both  have  a  zero  entry.  Considering  the  (q+ 1)  x  n  matrix  whose  rows  are  the 
vectors  in  the  set  x+ay  :  a  e  GF(q)},  and  counting  the  number  of  non¬ 
zero  entries  via  rows  and  via  columns,  gives 

w(y)  +  X  w(x  +  ay)  =  q(n-z)  =  O(modg).  (1) 

—  asGF(q)  ~ 

Let  C0  ={x  e  C:  w(x)  =  0  (mod  q)}.  If  x,y  e  CQ  then  (1)  implies  that 

X  w(*  +  ay)  s  0(mod  q). 

aeGF(q)\{  0}  “ 

By  the  hypothesis  of  the  theorem,  the  only  possibility  is  that  w(x+ay)  =  0 
(mod  q)  for  all  a.  Hence  CQ  is  a  linear  subcode  of  C. 

Furthermore,  CQ  has  dimension  k-l.  For  otherwise  there  exists  a  two- 
dimensional  subcode  D  of  C  all  of  whose  non-zero  codewords  have  weight 
congruent  to  d  (mod  q).  But  then,  if  x,y  are  linearly  independent  codewords 
in  D,  we  have 

w(y)+  X  w(x+ay)=(q+\)d=d*Q(mod  q), 

—  aeGF(q)  — 

contradicting  (1). 

Let  G  be  a  generator  matrix  of  C  of  the  form 

r  *  i 


Go 


X  1 

0 

G0  : 

0 

generates  an  [n+  l,k,d+ 1  ] -code  with  the  required  property.  □ 

Theorem  1  can  be  useful  in  classifying  codes  with  given  parameters  or  in 
showing  non-existence.  Examples  for  ternary  codes  are  given  in  [1]  and  [2]. 
We  give  here  two  other  examples. 

Example  I.  We  will  prove  the  uniqueness  of  [q2 , 4,  q  -  q  - 1] -codes  over 
GF(q). 

It  is  known  that  there  exists  an  optimal  [q2  +  1,4,  q2  —  gj-code  over 

GF(q)  which  meets  the  Griesmer  bound.  The  code  is  unique  because  the 
columns  of  a  generator  matrix  form  a  (q2  +  1)  -cap  mPG(3,q)  and  hence  must 
be  an  elliptic  quadric  [4].  Let  Cbea  \q  A,q  ~  q~  l]-code.  The  residual 
code  of  C  with  respect  to  a  codeword  of  weight  q  -  t  (2<f<g-l)isa 
[r,3,M]-code  which  cannot  exist  by  the  Griesmer  bound.  So  the  only  possible 
weights  of  C  are  q  -  q  -  1,  q  -  q,  q  —  1  and  q  .  By  Theorem  1,  C  can 
be  extended  to  a  [V  +  1,4,^2  -  <j]-code.  Finally  the  uniqueness  of  the 
punctured  [g2,4,<?2  -  q  -  l]-code  follows  from  the  fact  that  an  elliptic 
quadric  admits  a  transitive  automorphism  group. 

Remark.  Example  1  provides  a  simple  alternative  proof  of  the  well  known 
fact  that  every  q  -  cap  in  PG(3,q)  is  contained  in  a  (q2  +l)-cap,  a  result 
where  geometric  proof  is  fairly  long  (see  e.g.  [4]). 

Example  2.  It  was  shown  in  [3]  that  there  does  not  exist  a  [28,5,19]-code  over 
GF(4).  The  proof  can  be  simplified  by  using  Theorem  1.  It  is  straightforward 
to  show  that  such  a  code  has  no  codewords  of  weight  21,22,25  or  26  and 
hence  can  be  extended  to  a  [29,5,20]-code  which  had  already  been  shown  not 
to  exist. 
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Abstract  —  Tabu  search  is  a  stochastic  method  for 
combinatorial  optimization.  It  is  shown  how  this 
method  can  be  used  to  construct  various  record- 
breaking  codes. 

I.  Introduction 

The  problem  of  designing  good  codes  can  be  seen  as  an  op¬ 
timization  problem.  Unfortunately,  many  instances  of  this 
problem  are  so  hard  that  methods  that  provably  find  best 
possible  codes  with  respect  to  given  criteria  cannot  be  used 
in  practice.  During  the  last  decade,  a  lot  of  interest  has  been 
focused  on  stochastic  methods  for  finding  optimal  and  near- 
optimal  solutions  of  difficult  optimization  problems.  Simu¬ 
lated  annealing  has  turned  out  to  be  a  very  promising  such 
method.  In  1987,  El  Gamal  et  al.  [1]  showed  that  simulated 
annealing  can  be  used  in  the  construction  of  several  types 
of  codes:  constant  weight  codes,  source  codes,  and  spherical 
codes.  Since  then,  simulated  annealing  and  other  stochastic 
methods  have  successfully  been  used  in  many  papers  to  con¬ 
struct  codes.  For  a  survey  of  these  results,  see  [3]. 

II.  Tabu  Search 

Tabu  search  [2]  is  a  combinatorial  optimization  method  which 
in  many  recent  studies  has  turned  out  to  outperform  other 
stochastic  methods,  including  simulated  annealing.  One  char¬ 
acteristic  of  tabu  search  is  that  it  finds  good  near-optimal 
solutions  early  in  the  optimization  run.  Tabu  search  follows 
the  steepest  descent  heuristic,  but  has  additional  features  to 
avoid  getting  stuck  in  local  optima. 

At  each  step  in  the  optimization  process,  a  set  of  solutions 
that  slightly  differ  from  the  current  solution  is  evaluated.  The 
solutions  in  this  set  are  said  to  be  neighbors  of  the  current 
solution.  In  the  neighborhood,  a  new  solution  that  is  best  with 
respect  to  the  cost  function  used  is  chosen.  However,  some  of 
the  neighbors  must  not  be  chosen,  namely  those  obtained  by 
inverses  of  one  of  the  L  most  recent  moves.  The  list  of  these 
forbidden  moves,  which  has  length  L,  is  called  the  tabu  list 

III.  Constructing  Codes  Using  Tabu  Search 

Tabu  search  can  be  applied  to  several  construction  problems  in 
coding  theory.  In  the  search  for  a  code  with  given  parameters, 
the  number  of  codewords  is  fixed  and  the  problem  is  formu¬ 
lated  as  an  optimization  problem.  For  example,  in  the  search 
for  coverings,  the  cost  function  can  be  taken  as  the  number  of 
uncovered  words  in  the  space;  a  covering  code  has  then  cost 
value  zero.  The  cost  function  of  error-correcting  codes  can 
similarly  be  taken  as  the  number  of  words  that  are  covered 
more  than  once  by  Hamming  spheres  around  the  codewords; 
another  approach  is  to  consider  the  mutual  distances  between 
the  codewords. 

A  direct  search  for  a  large  code  does  not  work  very  well. 
However,  such  a  code  can  be  found  by  imposing  a  structure  on 
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it.  This  can  be  done  by  searching  for  a  code  that  is  a  union  of 
cosets  of  a  linear  code  or  that  has  a  nontrivial  automorphism 
group. 

Said  and  Palazzo  [5]  were — to  our  knowledge — the  first  to 
apply  tabu  search  to  problems  in  coding  theory.  They  used  the 
method  to  construct  linear  error-correcting  codes.  Recently, 
Ostergard  [4]  successfully  applied  tabu  search  to  the  construc¬ 
tion  of  covering  codes.  We  present  recent  results  on  the  ap¬ 
plication  of  tabu  search  to  code  constructions.  We  discuss 
covering,  error-correcting,  and  spherical  codes,  and  present 
new  codes  found  by  this  approach. 
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Abstract .-  This  contribution  presents  the  results  of  applying 
two  generic  algorithms  for  reducing  the  complexity  of  the  trellis 
of  a  number  of  binary  linear  block  codes. 

I.  Introduction. 

The  type  of  trellis  considered  was  originally  defined  by  Bahl,  et  al 

[1]  (the  BCJR  trellis).  Later,  both  Wolf  [2]  and  Massey  [3]  showed 
that  this  type  of  trellis  is  useful  because  it  enables  Viterbi 
Algorithm  decoding  of  linear  block  codes,  which  in  turn  means  that 
soft-decision  techniques  can  be  simply  applied  to  improve  decoding 
performance.  More  recently  it  has  been  shown  that  the  BCJR  trellis 
is  uniquely  "minimal"  in  a  number  of  ways  [4,5].  It  is  most 
convenient  to  construct  this  minimal  trellis  from  the  "trellis 
oriented"  form  of  the  code  generator  matrix,  as  originally  presented 
by  Forney  [6],  and  later  developed  by  McEliece  [4,5],  who  called  it 
the  minimal  span  generator  matrix  (MSGM).  The  span  of  a  row  of 
the  generator  matrix  is  the  number  of  symbols  in  the  row  enclosed 
between  the  leftmost  non-zero  symbol  and  the  rightmost  non-zero 
symbol.  The  total  span  of  the  generator  matrix  is  the  sum  of  the 
span  of  the  rows  of  the  matrix.  Any  generator  matrix  can  be 
reduced  to  MSGM  form  by  means  of  elementary  row  operations 
(linear  combinations  and  permutations  of  row).  The  span  of  the 
MSGM  is  a  useful  measure  of  the  complexity  of  the  code  trellis. 

II.  Complexity  Reduction. 

In  determining  the  MSGM  of  a  given  code,  column  permutations 
are  not  allowed.  It  is  easily  observed  however,  that  column 
permutation  can  lead  to  a  lower  total  span  matrix,  corresponding  to 
an  equivalent  linear  block  code  [3,7,8].  To  date  there  is  no  known 
algorithm  which  guarantees  that  the  "globally  minimal"  MSGM 
will  be  found,  we  would  not  even  know  when  we  have  reached  it, 
so  we  can  just  give  comparative  records.  It  is,  however,  possible  to 
determine  a  lower  bound  on  its  span,  given  by: 

/=i 

where  n  and  k  are  the  block  length  and  dimension  of  the  code 
respectively,  pi  is  the  dimension  (or  a  bound)  on  the  dimension  of 
the  best  "past"  code  at  depth  i  in  the  trellis  (i.e.,  the  optimum  code 
with  block  length  i  and  the  same  distance  as  the  whole  code)  and 
fi_i  is  the  dimension  of  the  best  future  code  at  depth  i-l  (i.e.,  the 
optimum  code  with  block  length  n-i+J  and  the  same  distance  as  the 
whole  code)  [4,5,8].  The  third  column  of  Table  1  gives  this  lower 
bound  span,  together  with  lower  bounds  on  the  numbers  of  edges 
and  vertices  in  the  code  trellis. 

The  fourth  column  in  Table  1  gives  the  parameters  of  the  trellises 
obtained  from  the  MSGM  before  column  permutations.  The 
MSGM  is  in  turn  derived  from  the  standard  systematic  generator 
matrix  of  the  code  by  applying  a  greedy  row  operation  algorithm. 

The  first  algorithm  for  reducing  the  total  span  of  the  code  MSGM 
by  column  permutation  is  based  in  one  devised  by  Wei  Lin  [7,9]. 
The  steps  of  the  algorithm  are  described  in  [8],  and  the  results 
obtained  are  given  in  the  fifth  column  of  table  1.  The  second 
algorithm  for  column  permutation  is  also  described  in  [8],  and  the 
results  appear  in  the  sixth  column  of  Table  1.  This  second 
algorithm  is  a  modified  and  extended  form  of  Wei  Lin's  algorithm, 
based  on  a  simulated  annealing  technique,  which  enables  improved 
results  to  be  obtained  even  for  quite  large  codes.  The  details  of  both 
algorithms  will  be  outlined  during  presentation  of  the  paper, 
together  with  further  results. 


III.  Conclusions. 

Table  1  indicates  the  significant  reduction  in  the  total  span,  as  well 
as  in  the  other  parameters,  which  can  be  obtained  by  means  of  the 
two  algorithms.  In  many  cases  the  total  span  is  quite  close  to  the 
lower  bound.  For  the  (32,16)  extended  BCH  and  (24,12)  extended 
Golay  codes  the  bound  is  achieved.  This  last  one  coincides  with 
Forney’s  generator  matrix  for  the  code  from  the  cubing  construction 
[6].  It  must  be  considered  that  the  calculated  bounds  can  not  be 
reached  sometimes,  as  McEliece  proves  [5],  The  relation  between 
span  and  complexity  of  the  trellis  is  not  direct;  we  conjecture  that 
despite  reaching  the  bound  on  the  span  value  does  not  mean 
reaching  it  for  the  elements  of  the  trellis,  in  the  other  way  round  the 
relation  does  apply;  i.e.,  the  minimum  number  of  elements  in  the 
trellis  will  only  be  given  for  a  globally  minimal  span  generator 
matrix. 

References. 

[1]  L.  R.  Bahl,  J.  Cocke,  F,  Jelinek  and  J,  Raviv  :  "Optimal 
Decoding  of  Linear  Block  Codes  for  Minimising  Symbol  Error 
Rate";  IEEE  Trans.,  Vol.  IT-20,  pp  284-287,1974. 

[2]  J.  K.  Wolf  :  "Efficient  Maximum  Likelihood  of  Linear  Block 
Codes  Using  a  Trellis";  IEEE  Trans,  on  Inform.  Theory,  Vol.  IT- 
24,  pp  76-80,  January  1978. 

[3]  J.  L.  Massey  :  "Foundation  and  Methods  of  Channel 
Encoding";  Proc.  Int.  Conf.  Inform.  Theory  and  Systems,  NTG- 
Fachberichte,  Berlin,  1978. 

[4]  R.  J.  McEliece  :  "The  Viterbi  Decoding  Complexity  of  Linear 
Block  Codes”;  IEEE  Int.  Symposium  on  Inform.  Theory, 
Trondheim,  Norway,  1994. 

[5]  R.  J.  McEliece  :  "On  the  BCJR  Trellis  for  Linear  Block 
Codes";  pre-print,  September  1994. 

[6]  G.  D.  Forney  :  "Coset  Codes  -  Part  II  :  Binary  Lattices  and 
Related  Codes";  IEEE  Trans.  Inform.  Theory,  Vol.  IT-34,  pp  1152- 
1187,  September  1988. 

[7]  S.  Dolinar,  L.  Ekroot,  A.  Kiely,  W.  Lin,  R.  J.  McEliece  : 
"Trellis  Complexity  of  Linear  Block  Codes";  in  preparation. 

[8]  L.  E.  Aguado-Bayon  :  "Fast  Trellis  Decoding  for  Block 
Codes";  Transfer  Report,  University  of  Manchester,  November 
1994. 

[9]  Wei  Lin  :  private  communication,  January  1994. 


Code 

Feature. 

Lower 

Bound 

Greedy 

Algo. 

Wei  Lin 
Algo. 

Simul. 

Anneal 

(23,12) 

Edges 

1,790 

12,284 

4,220 

3,452 

Golay 

Vertices 

1,214 

8,190 

3,134 

2,558 

Code 

Span 

124 

144 

133 

129 

(24,12) 

Edges 

2,696 

16,380 

4,348 

3,580 

Extend. 

Vertices 

1,790 

12,286 

3,262 

2,686 

Golay 

Span 

136 

156 

140 

136 

(31.16) 

Edges 

3,198 

196,604 

42,108 

6,268 

BCH 

Vertices 

2,174 

131,070 

31,550 

4,670 

Code 

Span 

186 

256 

231 

195 

(32,16) 

Edges 

4,789 

262,140 

22,780 

6,396 

Extend. 

Vertices 

3,198 

196,606 

17,086 

4,798 

BCH 

Span 

202 

272 

228 

202 

Table  1 :  Trellis  features  for  a  few  codes 
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A  linear  block  code  C  of  length  n  is  called  quasi-cyclic  (QC) 
if  it  is  invariant  under  a  cyclic  shift  of  L  positions,  TL ,  where 
L  <  n.  Any  cyclic  code  can  be  represented  by  a  unique  gen¬ 
erator  polynomial.  In  this  paper  we  associate  with  QC-codes 
a  polynomial  generator  set  which  is  a  natural  generalization 
of  the  generator  polynomial  of  a  cyclic  code.  A  canonical  gen¬ 
erator  matrix  of  a  QC-code  which  is  invariant  under  TL  is 
introduced  which  shows  the  symmetric  structure  of  the  rc/X- 
section  minimal  trellis  diagram  (MTD)  [1,  2,  3].  The  state 
space  dimension  is  nondecreasing  on  the  left  half  of  this  trel¬ 
lis.  The  canonical  generator  matrix  is  important  because  it 
provides  cr  siderable  information  about  the  trellis  complex¬ 
ity  of  QC  codes  as  well  as  the  relation  between  these  codes 
and  convolutional  codes. 

For  a  linear  block  code  of  length  rc,  the  interval  [i,j],  1  < 
i  <  3  <  71  j  is  said  to  be  the  support  interval  of  a  codeword 
c  =  (ci ,  •  •  • ,  Cn )  if  CiCj  ^  0,  and  ci  =  0  if  l  <  i  or  j  <  L 
j  —  i  +  1  is  defined  as  the  support  length  of  c,  and  c  is  said  to 
start  at  time  index  i  and  end  at  time  index  j.  c  is  also  said  to 
be  active  in  the  interval  [i,j  —  1]. 

A  generator  matrix  of  a  linear  block  code  is  called  a  trel¬ 
lis  oriented  generator  matrix  (TOGM)  if  no  two  rows  of  the 
matrix  either  start  or  end  in  the  same  position  [l,  3].  Let  M 
be  the  TOGM  of  a  linear  block  code  C.  Denoting  the  number 
of  rows  of  M  active  at  time  index  i  by  Si ,  we  define  the  state 
complexity  of  C  to  be  s  =  max{s0,  si,  •  •  • ,  Sn}- 

Let  M  be  a  generator  matrix  for  an  (Lm,k)  QC-code  in¬ 
variant  under  TL .  Define  the  k  x  iL ,  1  <  i  <  m,  matrices  Mi 
such  that  the  jth,  1  <  j  <  iL ,  column  of  Mi  is  the  same  as 
that  of  M .  Denote  the  rank  of  Mi  by  pt. 

Definition  1  (Cyclic  Form  Code)  An  (n,k)  linear  block 
code  C  is  called  a  cyclic  form  code  if  in  M  (the  TOGM  of 
C )  for  any  i,  1  <  i  <  k,  precisely  one  row  of  M  has  support 
interval  [a,  n  —  k  -f  *].  In  this  case  n  —  k  +  1  is  defined  as  the 
effective  length  of  C . 

Definition  2  (Smallest  Regular  Trellis  Diagram)  A 

trellis  diagram  G  of  a  linear  block  code  C  is  called  a  small¬ 
est  regular  trellis  diagram  (SRTD)  of  C  if:  1)  it  has  the  same 
state  complexity  as  the  MTD  of  C ;  2)  the  number  of  vertices 
of  G  at  time  indices  i  and  j  are  equal ,  1  <  i,j  <  n;  3)  G  has 
the  maximum  number  of  identical  parallel  sub-trellises  among 
all  trellises  of  G  which  satisfy  conditions  1  and  2. 

The  following  theorem  is  used  to  determine  the  SRTD  of  a 
QC-code. 

Theorem  1  ([4])  The  smallest  regular  trellis  diagram  of  an 
(n,  k)  linear  cyclic  form  block  code  C  consists  of  max{l,  22s~k} 
structurally  identical  parallel  sub-trellises ,  where  s  is  the  state 
complexity  of  the  code.O 
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The  main  result  of  our  work  is  contained  in  the  following  the¬ 
orem. 


Theorem  2  (Canonical  Generator  Matrix)  Let  C  be  an 

( n,k )  QC-code  invariant  under  TL ,  and  n  =  Lm.  If  M  is  a 
TOGM  of  C,  then 

pi 

C  =  ©c.-,  (1) 

1=1 

where  Ci  is  a  cyclic  form  code  (if  it  is  considered  to  be  a  code 
of  length  m  with  codeword  components  in  FL  [3]),  and  C  has 
TOGM 

M'  = 


Mic1) 

M(c2) 


(2) 


L  M(cPl)  J 


where  M(cl)  is  a  TOGM  of  the  cyclic  form  code  Cx  with  lead¬ 
ing  codeword  denoted  by  cl .  The  Ct ’s  are  called  the  canonical 
components  of  C .  The  number  of  canonical  components  of  C 
of  dimension  w,  denoted  by  xw,  is  2 pw  —  (pw-i  +  ).n 


The  set  of  polynomials  representing  the  cyclic  form  canonical 
components  of  C  is  defined  as  the  polynomial  generator  set  of 
C. 


Corollary  1  The  m-section  MTD  of  C  consists  of  22pi  p 2 
identical  parallel  sub- trellises. 


Decomposing  C  into  cyclic  form  sub-codes  using  Theorems  1 
and  2,  the  SRTD  of  a  QC-code  C  is  given  in  the  following 
corollary. 

Corollary  2  The  m-section  SRTD  of  C  consists  of  2^*=i  a* 
structurally  identical  parallel  sub-trellises ,  with 


m  —  mt  - hi 

27 

mi  > 

[-±3] 

max{ 0,  3 (rat  —  1)  —  m} 

if 

mi  < 

where  mi,  1  <  i  <  p\,  is  the  effective  length  of  the  i-th  canon¬ 
ical  component  of  C . 

This  provides  a  decomposition  of  a  QC-code  C  into  its  cyclic 
form  sub-codes  which  can  be  used  to  analyze  the  trellis  struc¬ 
ture  of  the  QC-code. 
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Abstract  —  We  are  able  to  define  minimum  weight 
codewords  of  some  alternant  codes  in  terms  of  solu¬ 
tions  to  algebraic  equations.  Particular  attention  is 
given  to  the  case  of  the  classical  Goppa  codes.  Grob- 
ner  bases  are  used  to  solve  the  system  of  algebraic 
equations. 

I.  Words  of  length  n 


Solomon  code,  GRSk{a,v ),  is  the  code  whose  codewords  are 
(v0F(a0), . . . ,  for  all  F  £  GF(q')[X],  deg  F  < 

k.  The  alternant  code  Ak(a,v)  is  the  G.F(g)-subfield  sub-code 
of  GRSk(a,v). 

We  consider  a  partial  class  of  alternant  codes,  the  alternant 
codes  r(L,G)  where  L  =  {1,  a, . . . ,  an_1 },  the  set  of  all  n-th 
roots  of  unity.  We  denote  these  codes  T(a,u).  We  get  that 
the  code  spectral  equations  of  Ak{a,v)  are 


We  consider  words  of  length  n  over  GF(q),  n  being  prime  to 
q.  A  primitive  root  a  is  fixed.  The  word  c  =  (co, . .  - ,  cn-i)  is 
identified  with  the  polynomial  co  -fciX-b . . .  +cn-i  Xn~1  mod 
Xn  —  1.  The  Fourier  Transform  of  c  £  GF(q')n,  denoted  <j>{c), 
is  A  =  (A0,  A 1, . . . ,  An-i),  At  —  a(at))  i  =  0  . . .  n  -  1. 

Let  c  =  (c0,...,cn_i)  £  GF{q)n.  The  locators  of 
c  are  {Xi, . . .  ,XW}  =  {a*1 , . . .  ,  atw  },  where  are 

the  indices  of  non  zero  coordinates  of  c.  The  elementary 
symmetric  functions  of  c,  denoted  by  <ti,...,<jw,  are  a  = 

(-1)<Ei<,1<...<;1<w.^.  ■’■Xii;  *  =  1"w-  The  Seneral- 

ized  Newton’s  identities  hold:  Vz  >  0,  At+W  -f  v\  Ai+w-i  + 

.  .  .  T  G’u?  At  —  0 . 

We  introduce  the  definition  of  a  spectrally  defined  code: 

Definition  1  Let  C  be  a  code  in  GF{q)n  (or  GF(q)n).  If 
there  exists  l  polynomials  in  n  variables  Pi,. . .  ,Pi,  such  that, 
for  all  c  £  GF(q)n  (or  GF(q)n),  c  belongs  to  C  if  and  only 
if  P(Ao, . . . ,  An— 1 )  =  . . .  =  Pi{Aq,  . . . ,  An-i)  —  0,  where  A  - 
<j>(c),  then  the  code  has  a  spectral  definition.  The  polynomials 
Pi, . . . ,  Pi  are  the  code  spectral  equations. 

Our  result,  which  is  a  generalization  of  a  case  of  a  cyclic 
code  [l],  is  the  following  theorem: 

Theoreme  1  Let  C  be  a  code  defined  by  the  spectral  equations 
Pi,.. Pi.  Let  Sc(w)  be  the  following  system  of  equations: 

Pl(Ao,  .  .  .  ,  An-l)  =  •  ■  ■  =  Pi(Ao, .  .  .  ,  An-i)  =  0 
Ai+W  +  (Ti  Ai+xu— 1  T  . . .  4"  awAi  =  0,  1  —  O..71  —  1 

with  in determinates  <Ji , . . . ,  <JW,  Aq  , . . . ,  An-i .  Let  A  = 
(A0, . . . ,  An-i)  be  a  solution  to  Sc(w)  (Te.  there  exists 
(J\ , . .  • ,  <TW  such  that  (<7i, . . . ,  crw,  A)  is  a  solution),  then  A  is 
the  Fourier  Transform  of  a  codeword  of  weight  <  w . 


II.  “Spectral  Definition”  of  some  alternant 

CODES 

Let  a  —  (cvo, . . . ,  o:n_i)  C  GF(q,)n  be  distinct  elements  in 
GF(q'),  and  let  v  =  (uo, . . . ,  un-i)  be  nonzero  elements  in 
GF(qf).  The  generalized  Reed  Solomon  code,  GRSk(oL,v),  is 
the  code  whose  codewords  are  (voF(chq),  . . . ,  vn-\ F(cxn-i)), 
for  all  F  e  GF{q')[X],  deg  F  <  k. 

The  alternant  code  Ak{a,v)  is  the  GF(q)- subfield  sub¬ 
code  of  GRSk(a,v).  Let  a  =  (ao,  •  •  •  j  «n-i)  €  GF(q')n 
be  distinct  elements  in  GF(q’),  and  let  v  =  (vo, . . . ,  vn_i) 
be  nonzero  elements  in  GF(q').  The  generalized  Reed 


f  ,  j  AiHj  =  0,  t  =  0  . . .  n  -  k  —  1 

J  Z^t+J  =  t  mod n  1  j  ’ 

Aiq  mod n  =  A^,  i  =  0  ...  Ti  1 

where  H  is  the  Fourier  Transform  of  h  defining  the  dual  of  the 
GRSk(v). 


III.  A  short  Goppa  code 

Since  classical  Goppa  codes  with  support  L  —  {<**,  i  = 
0  ...  n  —  I }  are  alternant  codes,  we  are  also  able  to  construct 
spectral  equations  for  these  codes.  As  an  example  we  study 
the  Goppa  code  of  length  32,  with  defining  polynomial  g(x)  = 
x3  -f  x  H-  1.  We  index  codewords  c  in  the  following  way:  c  = 
(coo,  co, . . . ,  c3o),  where  the  defining  set  of  the  Goppa  code  is 
L  =  {0,  1,  or, ... ,  a30}.  Since  our  result  works  for  a  support 
of  length  n  prime  to  2,  we  first  consider  the  sub-code  C31  of 
C  which  is  the  shortened  code  with  respect  to  the  coordinate 
Coo-  This  code  is  also  a  Goppa  code  with  support  L31  = 
{1,  a, .  . . ,  a30}  and  defining  polynomial  g(X).  Thus  writing 
the  system  5c31(7),  we  get  equations  for  codewords  such  that 
Coo  =  0.  Computing  a  Grobner  basis  of  the  system,  we  get  105 
solutions.  Next,  we  want  to  study  minimum  weight  codewords 
such  that  Coo  7^  0.  The  parity  check  matrix  for  C  is 


G  = 


'  1 
0 
0 


a°g(a°)~1 
(ayg(  a0)-1 


9(a30)-'  1 

a30!?^30)-1 

(o30)2^*30)-1  J 


We  search  for  words  Co, . . . ,  C30  of  weight  6,  of  length  31  such 
that  GV  =  (1,  0, . . . ,  0)t.  where  G;  is  the  parity  check  matrix 
for  C31 .  Thus  the  spectral  equations  for  these  codewords  are: 


(  T.  n  AiHj  =  1 

r,  ,  ...  AiHj  =  0,  t  =  1,2 

All  mod31  “  Aj  ,  t  =  0  ...  30 

These  equations,  plus  the  Newton’s  identities  for  the  weight  6, 
gives  equations  for  codewords  of  C  of  weight  7  whose  support 
is  not  included  in  [0,  30].  The  Grobner  basis  gives  23  solutions, 
thus  128  codewords  of  weight  7  for  the  whole  code  G,  as  in 
the  table  of  [2,  p344]. 
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Abstract  —  The  single  sender  single  receiver  authen¬ 
tication  model  was  extended  by  Desmedt  and  Frankel 
[1]  to  the  case  where  certain  groups  of  persons  are  able 
to  sign  a  message.  The  problem  is  further  developed 
and  discussed  in  [2].  The  unconditionally  secure  group 
authentication  problem  was  formulated  and  investi¬ 
gated  using  the  generalized  vector  space  construction 
in  [3].  We  give  information  theoretic  bounds  on  the 
security  of  a  group  authentication  scheme  and  pro¬ 
pose  a  construction  based  on  the  Shamir  secret  shar¬ 
ing  scheme  and  maximum  rank  distance  codes  (MRD- 
codes). 

I.  Summary 

Let  a  secret  key  K  be  shared  among  a  set  of  participants 
P  such  that  certain  subsets  of  participants  are  able  to  com¬ 
pute  the  authentication  tag  Z  =  F{M ,  K )  of  the  message  M 
,  where  F  is  the  authentication  function.  Denote  by  Ad,  the 
set  of  messages  and  by  /C  the  set  of  secret  keys.  The  receiver 
is  also  assumed  to  be  the  dealer  of  the  secret  key.  To  share 
a  secret  key  K  G  /C  he  uses  a  secret  sharing  scheme  to  give 
each  participant  the  share  K%.  Denote  by  T  a  monotone  ac¬ 
cess  structure,  i.e.,  the  set  of  qualified  groups  with  monotonic 
properties.  To  authenticate  a  message  each  participant  i  in  a 
qualified  group,  X  G  T  first  calculates 

F,X(M,K 0 

and  sends  this  to  a  (not  necessarily  trustable)  combiner  who 
evaluates  Z  =  Cx(Fx  (M,  Ki);  i  G  X,  which  equals  F(M,K). 
The  output  Z  of  the  combiner  is  the  authentication  tag,  which 
together  with  the  message,  is  sent  to  the  receiver,  who  can 
check  the  correctness  of  a  message  by  calculating  F(M ,  K) 
directly. 

As  in  ordinary  single  authentication  schemes  we  measure 
the  security  of  a  scheme  by  the  probabilities  of  successful  im¬ 
personation  and  substitution.  Denote  by  Pj  the  worst  case 
probability  of  finding  a  correct  authentication  tag  given  the 
knowledge  of  the  shares  of  any  non-qualified  group.  The  prob¬ 
ability  of  successful  substitution  attack  is  denoted  by  Ps  and 
is  defined  as  the  worst  case  probability  of  finding  a  correct 
authentication  tag  given  the  knowledge  of  the  shares  from  a 
non-qualified  group. 

As  in  ordinary  secret  sharing  we  call  a  scheme  perfect  if 
Y  £  T,H(K\Y)  =  H(K).  Using  results  on  secret  sharing 
schemes  [4]  we  are  able  to  prove  the  following  theorem  on  Pi 
and  Ps. 

Theorem  1  Let  Y  0  T  and  KiUY  G  T.  For  a  prefect  scheme 
Pi  >  (1) 

Ki,Y 

Ps  >  max2-"<K*'yz>.  (2) 

”  A\,y  v  y 

1This  work  was  supported  by  the  TFR  Grant  94-457 


We  especially  consider  the  situation  where  the  combiner  just 
adds  the  partial  authentication  tags  and  where  the  authenti¬ 
cation  function  F  is  linear.  Let  the  message  M  be  represented 
as  an  r  x  n  matrix  over  and  let  the  secret  key  k  be  a  vector 
of  length  n  over  IE).  Thus, 

F(M,  k)  =  Mk. 

Furthermore,  assume  that  the  dealer  uses  the  Shamir  scheme 

[5]  to  give  each  participant  a  share  kt  G  IE)* .  The  secret  key 
k  may  then  be  calculated  as  a  linear  combination 

k  -  0,k, 

•ex 

of  t  shares  from  a  qualified  group  X ,  i.e.,  at  least  t  participants. 
By  restricting  the  message  matrix  to  matrices  of  the  form 
[IrM],  where  Ir  is  the  r  x  r  identity  matrix  and  thus  is  M  an 
r  x  (n  —  r)  matrix  we  translate  our  scheme  to  one  equivalent  to 
an  authentication  function  in  the  well-known  form  fco+<7M(&i)3 
where  ko  is  the  vector  consisting  of  the  r  first  elements  and  ki 
a  vector  consisting  of  the  n—r  last  elements  of  k.  Furthermore, 
gM  •  IE)n-r  l—! ►  IE)r  belongs  to  a  set  of  linear  functions.  For  this 
situation  we  are  able  to  prove  the  following  theorem: 

Theorem  2  Denote  by  A  the  set  of  matrices 

A  —  {M  -  M]MeM,M^Me  M}. 

Then  for  the  group  authentication  scheme  described  above 


Pi  =  q~T, 

(3) 

Ps  =  q~d, 

(4) 

where  d  —  min^e^  rankA. 

This  relates  the  problem  to  codes  for  the  rank  metric  [6]  and 

construction  for  A2-codes  made  by  Johansson  [7].  The  above 

result  can  also  be  obtained  with  the  general  technique  in  [3]. 
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Abstract  —  In  this  paper,  we  extend  the  concept  of  in¬ 
formation  leakage  in  [1],  [2]  to  the  case  of  multi-output 
boolean  functions.  A  spectral  characterization  of  multi¬ 
output  boolean  function  is  given.  This  result  is  used  to 
express  different  forms  of  information  leakage  of  multi¬ 
output  boolean  function  in  terms  of  the  Walsh  transform  of 
every  linear  combination  of  its  output  coordinates.  Condi¬ 
tions  on  the  Walsh  transform  of  the  multi-output  boolean 
function  are  given,  which  imply  that  the  function  satisfies 
certain  cryptographic  properties  of  interest  such  as  balance, 
correlation  immunity,  Strict  Avalanche  Criterion  (SAC)  , 
higher  order  SAC,  Propagation  Criterion  (PC),  higher  order 
PC,  and  Perfect  non-linearity. 

Definitions: 

Throughout  this  paper,  let  Y  be  the  output  of  a  boolean 
function  /(X)  :  5 ►  Z™,  then 

•  The  Walsh  transform  of  the  linear  combination  of  its 
output  coordinates  c.f(X)  is  defined  as1 


Ax  G  Z”,  y  E  Z™  ,  Ay  G  Z™  ,  then  one  can  prove  that : 
Ny  —  2n/2-m  Yj  Ec(0)  (— l)c‘y, 

c€Z^ 

Nky  =  2n/^rn~k  Y  Fc(w°){-lf'*+c-y7 

c6zr 

NAxAy  4E  Fc2H(-i)Ai'“+A!,'c, 

c€Z™ 

where  w°  denotes  the  n-dimensional  vector  obtained  by 
completing  the  ^-dimensional  subvector  w  with  zeros.  For 
example  if  n  =  6  ,  x  =  {zo,  z23  £5}  then  w°  — 
{wo,0,W2,0:0}w5}.  Using  the  above  results  we  get: 

Theorem  1: 

Let  Y  be  the  output  of  a  boolean  function  f(X)  then 
the  different  forms  of  information  leakage  of  Y  can  be 
expressed  as: 


Fc(w) 


1 

2n/2 


Y  (-i)c-/(x)_™x 
xez? 


•  The  static  information  leakage  of  Y,  given  input  subvector 
Xk,  is  defined  by: 


Sl(Y;  Xk)  =  m-H(Y \Xk). 


ssi{Y)=m- 

SX(F;X0  =  m-2-*  E 

ye Z™  v  y  ' 

*GZ2 

DL(AY;  AX)  -  m  —  2- 


Similarly  the  dynamic  information  leakage  of  AY ,  given  where  Ny,Niy,NAxAy  are  given  by  the  equations  above, 
the  input  change  vector  AX  is  defined  by: 


DL(AY;  AX)  =  m  -  H(AY\AX) 

where  AY  =  Y(X)  ©  Y(X  ©  AX). 

•  The  self  static/dynamic  information  leakage  of  Y  is  de¬ 
fined  as: 

SSL{Y)  =  m  —  H(Y) 

SDL(Y )  =  m~  if  (AY)  . 

Results: 

Let  Y  be  the  output  of  a  boolean  function 
f(X)  then  for  Ny  =  #{X  G  Z?\f(X)  =  y}, 

Nxy  —  jfc{X  G  Z%  |  X) fc  =  x,  Y  =  y}  and  NAxAy  = 
#{XGZ2  \f(X®Ax)®f(X)  =  Ay],  x  e  z*. 


To  be  precise,  this  is  the  Walsh  Transform  of  the  function  (— 


Let  criterion  “C”  be  any  of  the  following  :  balance,  corre¬ 
lation  immunity  ,  Strict  Avalanche  Criterion  (SAC) ,  higher 
order  SAC  ,  Propagation  Criterion  (PC),  higher  order  PC  , 
or  perfect  nonlinearity. 

Theorem  2: 

If  Y  is  the  output  of  a  multi-output  boolean  function  then 
Y  satisfies  criterion  C  if  and  only  if  every  non  zero  linear 
combination  of  its  output  coordinates  satisfies  criterion  C. 
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This  talk  studies  the  application  of 
structures  based  on  error  correcting  codes 
to  systems  where  the  major  requirement  is 
not  error  control  but  secrecy.  In  many 
cases  the  same  code  can  achieve  both  error 
control  and  secrecy.  The  first  section  of 
the  talk  describes  an  optimal  construction 
for  combining  multiple  semi-secure 
channels,  e.g.,  a  bundle  of  fiber-optic 
cables  or  wires  running  through  individual 
conduits,  into  a  single  channel  with  much 
higher  security.  Usually  the  security  of  a 
communications  channel  cannot  be 
guaranteed,  only  promised  with  a  high 
degree  of  probability.  The  first  section 
shows  how  to  combine  semi-secure 
channels  in  such  a  way  that  any 
predetermined  number  may  be 
compromised  before  information  is 
revealed. 

Semi-secure  channels  can  take  on 
many  forms.  Any  conventional  or  public 
key  cryptosystem  used  over  a  public 
channel  is  only  semi-secure,  since  there  is 
currently  no  method  of  proving  that  any 
particular  system  purporting  to  have 
computational  security  is  genuinely  hard 
to  break.  Other  examples  of  semi-secure 
channels  are  copper  wires  running  through 
separate  conduits  pressurized  with  gas  to 
make  tampering  easy  to  detect  and  fiber 
optic  cables,  which  are  intrinsically  fairly 
difficult  to  tap. 

Clearly,  the  maximum  possible  secure 
capacity  of  a  set  of  semi-secure  channels  is 
just  the  sum  of  the  capacities  of  those 
channels  that  are  in  fact  secure.  The  first 
theorem  in  this  talk  states  that  this  bound 
on  total  secure  capacity  is,  in  fact, 
achievable. 

Theorem  1:  Given  a  set  of  N  channels, 
each  with  capacity  C,  any  K  of  which  can  be 
intercepted  by  the  enemy,  it  is  possible  to 
form  a  composite  channel  of  capacity  (N- 
K)C  which  is  completely  secure,  even  if 


neither  the  sender  nor  the  receiver  knows 
which  channels  have  been  intercepted. 

The  very  simple  constructive  proof  uses 
an  (N,K)  maximum  distance  separable 
(MDS)  code  which  can,  by  definition, 
correct  N-K  erasures.  The  K  inputs  to  the 
encoder  come  from  a  source  of  perfect 
randomness,  e.g.,  a  thermal  noise  source 
followed  by  a  hard  limiter.  The  symbols 
sent  over  the  first  K  channels  are  the  first  K 
symbols  produced  by  the  encoder.  If  the 
encoder  is  systematic,  then  these  may  be  just 
the  random  input  symbols  themselves.  The 
symbols  sent  over  the  remaining  N-K 
channels  are  formed  by  adding  one  symbol 
of  information  to  be  transmitted  to  each  of 
the  remaining  symbols  in  the  encoder  output 
and  then  sending  each  of  these  sums  over 
one  of  the  remaining  channels. 

The  concept  of  a  mixing  function  was 
introduced  in  Reference  1  to  improve  the 
security  against  ciphertext-only  attack  of  a 
single  cryptosystem  operating  over  a  single 
channel  by  destroying  the  local  statistics 
which  are  essential  to  assaults  based  on 
letter  or  word  frequency.  The  idea  is  to 
create  a  function  which  mixes  text  so  that 
small  groups  of  letters  appear  totally 
random,  i.e.,  have  maximum  entropy.  The 
talk  proceeds  to  show  how  mixing  and 
scrambling  functions  formed  from  error 
correcting  codes  can  be  used  to  enhance  the 
security  of  trunked  communications  circuits 
and  conventional  cryptographic  systems 
which  depend,  for  their  security,  on 
unproved  assertions  about  computational 
difficulty. 

The  last  segment  of  the  talk  presents  a 
concept  for  applying  information  theoretic 
security  to  spread  spectrum  communications 
and  ranging  systems  so  that  even  an 
intended  recipient  of  the  message  will  not  be 
able  to  jam  the  signal.  Airplane  instrument 
landing  systems  and  other  navigation  signals 
are  an  obvious  potential  application  of  this 
idea. 

1)  C.E.  Shannon,  “Communication 
Theory  of  Secrecy  Systems”  Bell  System 
Technical  Journal,  vol.  28  October  1949 
pp.711-715 
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Abstract  -  In  order  to  solve  the  problem  that  the  DES 
may  be  attacked  by  the  differential  cryptanalysis,  this 
paper  aims  to  design  the  Extended-DES,  first  by 
breaking  a  block  that  is  composed  of  96  bits  into  3 
sub-blocks,  then  performing  different  f  functions  on 
each  of  the  3  sub-blocks,  and  finally  increasing  the  Si-S« 
of  the  S-box  to  S1-S16  which  makes  it  less  vulnerable  to 
attack  by  differential  cryptanalysis. 

I .  Summary 

In  order  to  increase  the  cryptographic  security  of  the  DES, 
this  paper  offers  some  suggestions  as  follows. 

The  128  key  bits  that  are  inputed  to  increase  the  key  from 
56  bits  to  112  bits,  are  each  divided  into  64  bits,  Ki,K2. 
According  to  the  key  schedule  of  the  DES,  there  are  64  bits 
in  Ki  on  the  left,  and  then  after  removing  8  parity  bits, 
through  the  Permuted  Choice  1(PC-1),  56  bits  are  outputed. 
The  56  bits  are  then  divided  into  28  bits  on  the  left  and  on 
the  right.  Then  the  sub-key  is  shifted,  according  to  the 
number  of  times  of  the  left  shift  of  the  key  schedule  in  each 
round.  They  produce  the  Ki,i-Ki,i6  that  are  the  sub-keys  of  48 
bits  through  the  Permuted  Choic  2(PC-2).  K2,  the  64  bit  on 
the  right,  and  K2,i-K2,i6,  the  sub -keys  of  48  bits,  are  also 
produced  by  the  key  schedule.  As  a  result,  when  applying  Ki,i 
and  K2,i  to  the  f  functions  on  the  left  and  the  right,  the 
following  encryption  and  decryption  formula  is  derived: 
Encryption  :  Ai=Bi-i 

Bi=Ci-i©f(Bj.i,K2,i) 

Ci=Ai-i©f(Bi-i,Ki.i) 

Decryption  :  Ai-i=Ci©f(Ai,K2>i) 

Bi-i=Ai 

Ci.i=Bi®f(Ai,Ki,i) 

As  in  the  figure,  the  A 16  and  B16  of  the  last  round  during 
the  encryption  process  should  be  interchanged  while  the 
decryption  process  remains  the  same,  except  that  Ai  and  Bi 
should  be  exchanged  and  inputed  into  the  sub-block.  The  key 
has  to  be  inputed  in  the  reverse  order  of  Ku6,Ki.i5,Ki.u,....,Ki.i 
with  K1.1  on  the  left  and  K2.1  on  the  right.  Si-Ss  on  the  left 
and  S9-S16  on  the  right  should  also  be  interchanged.  Against 
an  attack  by  differential  cryptanalysis,  the  iteration  number  of 
the  f  function  performed  in  each  sub-block  during  16  rounds 
should  be  different,  which  creates  a  decreased  probability  of 
having  the  feature  of  N  round.  During  the  DES,  the  f  function 
performed  in  the  sub-block  is  repeated  8  times;  in  the 
Extended-DES,  it  is  repeated  11  times  during  the  performance 
of  Ao  to  B16,  10  times  during  from  Bo  to  A16,  and  11  times 
during  Co  to  B16.  Additionally,  it  is  known  that  the  f  function 
is  repeated  differently  according  to  each  of  the  sub-blocks. 
Therefore,  as  the  DES  has  the  same  iteration  number  of  f 
functions  according  to  each  sub-block,  it  can  easily  be  attacked 
by  the  differential  cryptanalysis;  but  because  the 
Extended-DES  has  a  different  iteration  number  of  f  functions 
in  each  sub-block,  it  can  be  said  that  it  resists  differential 
cryptanalysis.  Also,  in  the  Extended-DES,  the  Si-Ss  of  the 
S-box  is  enlarged  to  Si-Sie  and  the  S-box  is  chosen  when  each 
entry  is  suitable  both  for  the  SAC,  and  the  correlation 


SOUTH  KOREA, 
coefficient  condition. 


Fig.  Algorithm  of  the  Extended-DES 


To  improve  the  cryptographic  security  in  the  Extended-DES 
design,  each  entry  in  the  S-box  is  arranged  randomly,  so  that 
S-box,  which  agrees  with  the  condition  as  well  as  the 
correlation  coefficient  is  increased  to  S1-S16.  The  condition  that 
the  probability  of  the  output  bit,  j*,  being  changed  is  Pij=X/2n 
when  the  input  bit  of  1th  is  complemented.  The  nearer  Pij 
approaches  to  0.5,  the  closer  the  S-box  is  to  the  condition  of 
SAC.  The  result  of  the  simulation  shows  that  the 
Extended-DES  agrees  with  the  condition  of  SAC  better  than 
the  DES  in  that  the  Pij  of  the  S-box  in  the  Extended-DES 
approaches  nearer  to  0.5.  Then,  the  correlation  coefficient 
between  each  bit  of  the  S-box  output  must  be  independent, 
and  is  considered  to  be  the  better  design  when  the  correlation 
coefficient(-l  ^  |  py(k)  l  <;1)  approaches  zero.  In  this  paper,  the 
Pij  of  the  S-box  in  the  Extended-DES  approaches  zero  nearer 
than  the  Pij  of  the  DES.  Consequently,  it  is  known  that  when 
designing  the  S-box,  SAC  and  the  correlation  coefficient,  the 
S-box  of  the  Extended-DES  is  better  than  the  DES’s. 
Therefore,  the  Extended-DES  developed  in  this  paper  has  been 
implemented  into  software  and  it  has  been  verified  that  it's 
cryptographic  security  is  superior  to  that  of  the  DES. 

References 

[1]  E.  Biham  and  A.  Shamir,  "Differential  Cryptanalysis  of 
DES-like  Cryptosystem,"  Weizmann  Institute  of  Science, 
Technical  Report,  Rehovot,  Israel,  19  July  1990. 

[2]  R.  Forre,  "The  Strict  Avalanche  Criterion:  Special 
Properties  of  Boolean  Function  and  an  Extended 
Definition,"  Proc.  of  Crypto'88,  Springer-Verlag,  pp. 167-173, 
1989. 


lTThis  work  was  supported  by  a  Chosun  University  Grant. 


353 


Constructions  of  asymmetric  authentication  systems 

Thomas  Johansson1 

Department  of  Information  Theory,  Lund  University,  Box  118,  S-221  00  Lund,  Sweden. 


Abstract  —  Constructions  of  asymmetric  authenti¬ 
cation  systems  based  on  families  of  mappings  with  the 
vector  space  property  are  considered. 

I.  Introduction 

Simmons  [1]  introduced  asymmetric  authentication  systems 
when  he  extended  conventional  authentication  codes  to  codes 
with  arbitration,  called  A2-codes.  It  is  now  the  notion  for  any 
authentication  system  where  the  participants  possess  differ¬ 
ent  keys  which,  in  some  way,  are  dependent.  Several  different 
systems  of  this  kind  have  been  considered  [2],  [3],  [4]. 

II.  A2-CODES  AND  VECTOR  SPACES  OF  MAPPINGS 
Let  T  =  {fi}  be  a  set  of  functions  fi  :  S  — ►  jR,  where  R  is  a 
ring.  Let  T  have  the  vector  space  property,  i.e.,  cifi+c2fj  €  T 
for  any  ci,C2  6  R  and  any  fi,fj  6  T,  i  ±  j.  We  randomly 
choose  /,  / 1 ,  /2  £  T  and  z  e  R  in  such  a  way  that  f  =  fi  +zf2. 
The  A2-code  is  now  given  as  follows.  The  transmitter  has 
as  his  key  Et  the  pair  (/i,/2)  and  the  receiver  has  as  his 
key  Er  the  pair  (/,  z).  To  send  the  source  state  s  6  <S  the 
transmitter  generates  the  message  m  =  (s,  f2(s)).  The 

receiver  receives  m  =  (s,  m2,  m3)  and  checks  that  f(s)  =  m2  + 
zms.  In  a  correct  transmission,  m2  =  /i(s),m3  =  /2(a),  and 
thus  f(s)  =  fi(s)  +  zf2(s ). 

III.  Broadcast  authentication  systems 
The  idea  of  broadcast  authentication  systems  was  first  intro¬ 
duced  by  Desmedt  and  Yung  [2].  We  generalize  their  ideas  to 
include  any  specified  attack.  The  set  of  participants  V  con¬ 
sists  of  a  transmitter  T,  a  set  of  receivers  1Z  —  { Ri },  and 
possibly  a  set  of  other  participants  O  =  {Oi}.  The  transmit¬ 
ter  T  will  generate  a  message  m,  and  it  can  be  addressed  to 
any  Rl  6  7Z,  or  to  some  specified  subset  of  1Z.  The  address 
is  contained  in  the  source  state  s,  and  changing  it  implies  a 
substitution  attack.  We  also  specify:  how  disputes  are  to  be 
solved;  collaboration  sets  C  =  { Cx }  (which  collusions  of  cheat¬ 
ing  participants  exist  against  participant  x);  verification  sets 
Vx  (which  participants  must  be  able  to  verify  messages  to  a 
certain  receiver  x); 

We  describe  the  existing  attacks.  There  are  two  classes  of 
attacks.  The  first  class  of  attacks  is  some  subset  of  partic¬ 
ipants  trying  to  get  a  fraudulent  message  accepted  by  some 
receiver,  i.e.,  trying  to  cheat  a  receiver.  We  separate  into  two 
cases,  depending  on  whether  the  transmitter  is  included  in  the 
cheating  subset  or  not.  We  denote  the  probability  of  success 
as  Pi(C)  for  the  impersonation,  and  Ps{C)  for  the  substitution 
attack,  when  the  transmitter  is  not  included  in  the  cheating 
subset.  If  the  transmitter  is  included,  we  denote  the  probabil¬ 
ity  of  success  as  Pt(C). 

The  second  class  of  attacks  is  a  subset  collaborating,  claim¬ 
ing  to  have  received  a  message  that  was  never  sent  and  thus 
trying  to  frame  the  transmitter.  Here  we  have  both  the  im¬ 
personation  case  and  the  substitution  case.  We  denote  the 
probability  of  success  as  Pr0(C)  and  Pr1(C),  respectively. 

1This  work  was  supported  in  part  by  the  Swedish  Research 
Council  for  Engineering  Sciences  under  Grant  94-457 


Let  A 4(et)  be  the  set  of  messages  that  the  transmitter  can 
generate  when  he  is  in  possession  of  the  key  et,  and  let  e(£)  be 
the  set  of  keys  for  a  subset  C  of  participants.  The  definitions 
of  the  probabilities  of  success  in  the  different  attacks  are: 


Pi(C) 

Ps(C) 

PT(C) 

PRo(C) 

Pri(C) 


max  max  max  P(m  accepted  by  RAeiC)), 

Rt  CeCR.  e(£),m 
T$lC 

max  max  max  P(m  accepted  by  Ri\m,  e(£)), 

i?i  e(C),m,m' 

T£C  m^rri 

max  max  max  P(m  accepted  by  Ri I e(£)), 

Ri  CeCRi  e(£),m 
Tec1  mgM(et) 

max  max  P(m  6  M(et)\e(C)), 

CeC'R  e(£),m 

max  max  P(m  €  M(et)\m  €  M{et),  e(£)). 

CeC-R  e(£),m,m/ 

m^m' 


An  important  class  of  broadcast  authentication  systems  is  a 
system  where  V  =  {T,  R\,  R2, . . .  ,  Rn,  A},  and  such  that:  an 
honest  arbiter  A  makes  decisions  in  case  of  a  dispute;  all  at¬ 
tacks  from  any  subset  of  at  most  k  participants  (excluding  the 
arbiter)  exist;  the  verification  set  is  Vi  =  {Ri ,  R2, . . .  ,  Rn,  A}, 
Vz  =  1, . . .  ,  n.  We  call  such  a  system  an  (n,  /c)-threshold  USDS 
[4].  Such  system  can  be  constructed  by  choosing  fj  6  T 
and  z^  e  R  such  that 


where  any  (k  +  1)  rows  are  linearly  independent.  The  trans¬ 
mitter’s  key  is  Et  —  (/1 ,  •  • .  ,  fk+ 1)  and  receiver  i  has  key 
Er{  =  (/^,22*\...  transmitter  sends  the  mes¬ 

sage  m  =  (s,  /i(s), . . .  ,  /fc-j-i(s))  and  receiver  i  checks  that 

h(s)  +  4°  Ms)  +  •  •  •  +  4+i/fc+i(s)  =  /(,)(s)- 

An  example  of  performance  is  given  by  the  following  theorem. 
Theorem  1:  Let  T  =  {/(s);  /(s)  =  as  +  6,  Va,  b  E  }.  Then 

(1)  is  an  (n^k) -threshold,  USDS ,  where 

P,(C)  =  PS(C )  =  Pt(C)  =  PRo(C)  =  PRl(C)  =  1/q. 

We  further  consider  collaboration  sets  which  are  not  of 
threshold  type. 


References 

[1]  G.  J.  Simmons,  “A  Cartesian  Product  Construction  for  Uncondi¬ 
tionally  Secure  Authentication  Codes  that  Permit  Arbitration” , 
Journal  of  Cryptology ,  vol.  2,  no.  2,  1990,  pp.  77-104. 

[2]  Y.  Desmedt,  M.  Yung,  “Arbitrated  unconditionally  secure  au¬ 
thentication  can  be  unconditionally  protected  against  arbiter’s 
attack”,  Proceedings  of  Crypto’90 ,  LNCS  537,  pp.  177-188. 

[3]  Y.  Desmedt,  M.  Yung,  “Multi- receiver/Multi-Sender  network 
security:  Efficient  authenticated  multicast/feedback”,  Infocom, 
May  1992. 

[4]  T.  Johansson,  “Contributions  to  unconditionally  secure  authen¬ 
tication”,  Ph.D.  Thesis,  Lund  1994. 


354 


Two  Simple  Schemes  for  Access  Control 

Man  Yiu  Chan  and  Raymond  W.  Yeung 

Department  of  Information  Engineering,  The  Chinese  University  of  Hong  Kong,  N.T.,  Hong  Kong 


I.  Introduction  and  Summary 

In  an  open  information  system,  all  the  information  in  the  sys¬ 
tem  are  known  by  the  public.  For  such  a  system,  the  users 
are  responsible  to  encipher  their  own  information  in  a  way 
that  they  can  be  accessed  by  authorized  users  only.  Let 
U  =  {Ui, . . . ,  Un}  be  the  set  of  users  in  the  system.  Asso¬ 
ciated  with  each  Ui  is  an  authorization  list  Ai  C  U  such  that 
Ui  can  access  the  information  of  Uj  if  and  only  if  Ui  £  Aj. 

In  this  paper,  we  propose  two  simple  access  control 
schemes.  For  the  first  scheme  (Scheme  l),  for  any  1  <  i,  j,  k  < 
n, 

(Uk  £  Aj  and  Uj  £  Ai)  =►  Uk  £  Ai.  (l) 

In  other  words,  if  Uk  can  access  the  information  of  Uj  and  Uj 
can  access  the  information  of  Ui,  then  Uk  can  accessed  the 
information  of  Ui.  This  is  called  hierarchical  accessibility.  For 
the  second  scheme  (Scheme  2),  there  is  no  constraint  on  the 
authorization  lists.  To  our  knowledge,  this  is  the  first  scheme 
in  the  literature  that  supports  arbitrary  accessibility. 

II.  Scheme  1:  Hierarchical  Accessibility 
If  Ai}  i  =  1, . . . ,  n  satisfy  (1),  the  elements  in  U  has  a  partial 
order,  with  Ui  >  Uj  signifies  that  Ui  can  access  the  informa¬ 
tion  of  Uj.  Ui  is  called  a  predecessor  of  Uj ,  and  Uj  is  called 
a  successor  of  Ui.  The  scheme  we  propose  is  as  follows.  Each 
Ui  has  an  encryption  algorithm  Ei  and  a  decryption  algo¬ 
rithm  Di  which  are  parametrized  by  ei  and  dit  respectively, 
where  ei  is  publicly  revealed  and  di  is  kept  secret  to  Ui.  (It 
is  assumed  that  the  class  of  encryption/decryption  algorithms 
that  Ei  and  Di  belong  to  is  publicly  known,  and  Ei  and  Di 
are  completely  characterized  by  ei  and  di ,  respectively.)  Fur¬ 
ther,  the  encryption/decryption  pair  (Ei,Di)  forms  a  public 
key  cryptosystem,  i.e., 

(PK1)  For  each  message  m,  Di(Ei(m))  =  m. 

(PK2)  Ei  and  Di  are  easy  to  compute. 

(PK3)  It  is  practically  impossible  to  find  a  decryption  algo¬ 
rithm  D[  from  Ei  such  that  D[(Ei(m))  =  m  for  all  m. 

The  scheme  works  as  follows.  Let  mi  be  the  information  of 
Ui.  Each  Ui  enciphers  mi  as  Ei(mi)  and  reveals  it  publicly. 
Let  Uj  be  an  immediate  predecessor  of  Ui  (Ui  can  have  more 
than  one  immediate  predecessor).  In  order  that  Uj  can  access 
the  information  of  Ui,  Ui  enciphers  di  as  Ej(di)  and  reveals 
it  publicly.  Then  Uj  can  recover  di  as  Dj(Ej(di))  and  then 
recover  mi  as  Di(Ei(mi)). 

Now  suppose  Uj  is  an  immediate  successor  of  Uk  a-nd  Ui  is 
an  immediate  successor  of  Uj.  Then  Uk  can  recover  dj  as  de¬ 
scribed  above.  With  dj ,  Uk  can  also  recover  di  as  Dj(Ej(di))y 
since  Ej(di)  is  publicly  known.  With  dit  Uk  can  then  recover 
77K.  Likewise,  Uk  can  access  the  information  of  any  of  its 
successors. 

Different  hierarchical  access  schemes  have  been  proposed 
in  the  literature  ([1,2],  [4]- [7]).  All  these  schemes  have  the 
common  property  that  key  management  in  the  system  is  per¬ 
formed  by  a  central  authority.  By  contrast,  our  scheme  is 
completely  decentralized  and  does  not  need  a  central  author¬ 
ity.  In  addition,  our  scheme  has  the  following  advantages: 


1.  Users  are  allowed  to  choose  their  own  keys. 

2.  It  is  not  necessary  to  deliver  keys  to  the  users  in  a  secure 
way  (cf.  for  example  [1,2]). 

3.  The  amount  of  storage  required  is  proportional  to  the 
total  number  of  immediate  successors  in  the  system. 

4.  Insertion  and  deletion  of  users  are  simple,  and  do  not 
affect  the  encryption  and  decryption  procedures  of  ex¬ 
isting  users. 

5.  Update  of  encryption  and  decryption  keys  is  simple. 

III.  Scheme  2:  Arbitrary  Accessibility 
We  assume  that  Ai}i  =  1, . . . ,  n  are  arbitrary.  In  this  scheme, 

each  Ui  has  two  encipher  algorithms  Eu  and  E2i,  and  two 
decipher  algorithms  Du  and  D2i,  which  are  parameterized  by 
eit,  e2;,  du  and  d2i,  respectively.  e2;  are  revealed  publicly, 
while  eit,  du  (called  the  file  decryption  key)  and  d2i  are  kept 
secret  to  U{.  (E2i,D2i)  forms  a  public  key  cryptosystem,  while 
(Eii,  Du)  forms  a  conventional  cryptosystem. 

The  scheme  works  as  follows.  Each  Ui  enciphers  its  in¬ 
formation  mi  as  Eu(mi)  and  reveals  it  publicly.  For  each 
Uj  £  Ai,  Ui  enciphers  du  as  E2j(du)  and  reveals  it  publicly. 
Then  Uj  can  recover  du  as  D2j(E2j(du)),  and  then  recover 
mi  as  Du(Eu(mi)).  It  is  easy  to  see  that  if  Uj  £  Ai,  then  Uj 
cannot  access  the  information  of  Ui. 

In  Scheme  1,  a  user  uses  the  same  encryption/decryption 
pair  for  both  its  own  information  and  the  decryption  keys  of 
its  immediate  successors.  For  this  reason,  a  user  can  access 
the  information  of  all  its  successors.  In  Scheme  2,  however, 
two  different  encryption/decryption  pairs  are  used  for  its  own 
information  and  the  file  decryption  keys  of  those  users  whose 
information  it  can  access.  This  arrangement  breaks  up  the 
hierarchical  structure  of  the  scheme. 

To  our  knowledge,  this  is  the  first  scheme  in  the  literature 
that  supports  arbitrary  accessibility.  This  scheme  enjoys  all 
the  advantages  of  Scheme  1  except  that  the  amount  of  storage 
required  is  proportional  to  |Ai|,  which  is  upper  bounded 

by  n2. 
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Abstract —  We  propose  two  different  methods  for  at¬ 
tacking  Tanaka’s  IDNIKS  presented  in  SCIS’94.  One 
is  to  find  the  secret  informations  using  public  param¬ 
eters,  and  the  other  is  to  find  the  center’s  secret  keys 
by  collusion. 


I.  Introduction 

1  Dentity-based  Non-Interactive  Key  Sharing(IDNIKS)  scheme 
was  first  proposed  by  Blomfl].  Since  then,  there  have  been 
many  works  for  IDNIKS  [2],  [3],  but,  many  of  them  were  found 
to  be  breakable  by  collusion  attacks  [4],  [6]. 

In  SCIS’94,  H.Tanaka[5]  proposed  the  new  IDNIKS  which 
could  be  easily  implemented.  We  propose  two  methods  for 
attacking  Tanaka’s  IDNIKS.  The  first  method  is  to  find  the 
seciet  informations  using  public  parameters  of  the  center,  and 
the  otliei  is  to  find  the  center’s  secret  keys  by  8  collaborators. 


II.  Tanaka’s  IDNIKS 


At  first,  a  center  chooses  RSA  modulus  N(=  PQ),  one¬ 
way  hash  function  /,  random  number  e,ei,e2  satisfying 
gcd(ei,e2)  =  gcd(ei,e)  =  1.  And  choose  x,  y,  d,  r\ ,  r2  sucli 
that  a;  =  {r\L)l  gcd(e* (cc2-f  Cj  )n ,  L),  y  =  (r2 L)/ gcd(e|(ee2  + 
e  1 ) r'2  >  L)jd  =  Lj  gcd(ee2  —  e\ ,  A),  and  keep  them  secret.  Cen¬ 
ter  selects  a  random  number  rA  for  entity  A,  and  calculates 


the  secret  keys  gAi  and  gA2  such  that  gA i  =  r~dgxIAi 


gA2 


rAd9yI*-  where  IA1  =  e1f{IDA)  +  e  =  txfA  +  e,IA2  = 
e2  Ja  +  1. 

The  common  key  1\Ab  between  A  and  B  can  be  obtained 


as  follows  :  =  Jp  ,/j?  =  =  K 


-(B) 
AB  ■ 


III.  Attacking  Methods 

Method  1  :  Assume  that  gcd(ej(ee2  +  ei),  L)  =  gcd(ei(ee2  + 
ci),  L),  gcd(e2(ec2+ei),  L)  =  gcd(e2(ee2+ei ),  L).  In  RSA  type 
modulus  N ,  the  previous  equations  hold  with  high  probabil¬ 
ity.  Then,  the  following  relations  hold  :  x e i in  —  yc2m  =  0 
(mod  L)  =►  K™b  =  (mod  where  m  denotes 

gcd(ec2  +  ei,L).  Thus  the  m-th  power  of  common  keys  be¬ 
tween  any  entities  have  the  same  value.  If  m  is  small  enough, 
then  the  common  keys  between  entities  will  be  the  same  value 
with  high  probability. 

Method  2  :  Now  we  consider  collusion  attack.  First,  an 
entity  A  builds  the  following  equations  for  the  common  key 
P  ab  between  two  entities  A  and  B : 


Xo  =  ^2*ee?+2 ye%  ^ 


V 3 


,V,  =  A's  =  i Xo  = 

YAl  =  x{*XjAXa,  Ym  =  x£aX{aX :b,YM  =  xfAX(AX0, 
Xad  =  (x{A  X'AX3)fB(x£x{AX5)fD  (x£  x£ax0) 

=  y£?Y^YA3  (mod  71). 

The  attacking  procedure  consists  of  the  following  3  steps  : 


step  1.  As  above,  any  entity  A  can  obtain  KAB)  KAC} 
Pad ,  PAe,  PAf i  PAg,  Kah  and  KAi,  Using  them, 
YA1 )  Yl 2.  YAz  can  be  easily  obtained. 

step  2.  By  collaboration,  we  obtain  Xf ,  X\,  X3,  X*,  X£, 
and  Xq. 

step  3.  Then  a  common  key  Kuv  for  any  entities  U  and  V 
can  be  expressed  : 

Kuv  =  X  X24  C2 + 1'2  X“C3 + *'3  XlCA + 4  X52cs + 15  X6 , 

where  0  <  ix ,  i2)  U  <  4  and  0  <  t3,  i5  <  2.  Let  a  =  fv  mod  4 
and  b  —  fv  mod  4.  Then  in  case  (a,  b)  =  (0,  0),  (0,  2),  (2,  0)  or 
(2,  2),  z'i,  ...  ,  z’s,  are  all  zeros,  and  so  Puv  can  be  calculated. 
All  the  other  cases  are  classified  as  follows: 

0)  (°>  1),  (0,  3),  (1,  0),  (3,  0)  :  AT  AT 
(»)  (1,  1).  (3,  3)  :  X 1X2X4 

(iii)  (1,  2),  (2,  1),  (2,  3),  (3,  2)  :  X22A3X42AT 

(iv)  (1,  3),  (3,  1)  :  XiXl 

Hence  Tanaka’s  scheme  can  be  broken  if  8(A,  B,  C\  D,  E, 
E,  G,  and  II)  entities  collude,  whose  hashed  values  are  : 
fA  mod  4  =  0,  Jb  mod  4  =  1,  fc  mod  4  =  2,  fB  mod  4  =  3, 
fE  mod  4  =  0,  fF  mod  4  =  I,  fG  mod  4  =  2,  fH  mod  4  =  3^ 
and  gcd(R,  S)  =  2  where  R  =  (fA  -  fB)(fc  -  fD)(fA  +  fB  _ 
fc  -  fD)  and  5  =  (fE  -  fF){fG  ~  fn)(fE  +  />  -  fa  -  fH). 

IV.  Conclusion 

In  this  paper,  we  introduced  two  different  ways  for  attacking 
Tanaka’s  IDNIKS.  First,  we  have  shown  how  to  get  the  secret 
information  from  the  public  parameters  of  the  center.  Second, 
even  if  it  is  impossible  to  get  the  secret  information  of  the 
public  parameters,  we  have  shown  that  Tanaka’s  scheme  can 
be  broken  bv  8  collaborators. 
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Abstract  —  A  new  simply  implemented  identity- 
based  non-interactive  key  sharing  scheme(IDNIKS) 
has  been  proposed.  The  center  algorithm  is  very  sim¬ 
ple  and  easily  implemented.  The  security  depends  on 
the  difficulty  of  factoring.  The  proposed  IDNIKS  can 
be  certified  to  be  secure  for  the  considerable  attacks 
involving  user’s  collusion. 

I.  Introduction 

In  this  paper  a  new  identity-based  non-interactive  key  sharing 
scheme(IDNIKS)  has  been  proposed  in  order  to  realize  the 
Shamir’s  original  concept  of  identity-based  cryptosystem[l]. 
The  algorithm  is  very  simple  and  easily  implemented.  The 
security  depends  on  the  difficulty  of  factoring  and  it  can  be 
certified  to  be  secure  for  user’s  collusion. 

II.  Basic  Center  Algorithm 
Let  P  and  Q  be  two  large  primes  and  their  product  be 
N  =  PQ.  Then  the  Carmichael  function  of  N  can  be  given  by 
L=LCM{P-1,Q-1}.  Let  ^  be  a  primitive  element  in  GF(P) 
and  GF(Q),  and  let  n  and  w(~nf 2)  be  two  positive  inte¬ 
gers  which  satisfy  gcd{u/,  L}-1 .  We  assume  here  that  the 
identity  information  of  each  user  l  (/=A,B,C,...)  is  given 
by  ID i ,  and  introduce  a  one-way  function  /  which  satisfies 
0  <  -  f(IDi)  <  nCw  -  1.  Then  using  the  Schalkwijk  al¬ 

gorithm^],  we  obtain  a  constant  weight  binary  vector  v*  = 
(a/|0 af,«  €  GF(2)  from  I,,  where  the  Ham¬ 
ming  weight  of  vj  is  w.  We  assume  here  that  any  n-vectors  of 
v,  are  linearly  independent.  From  the  vector  v,  an  index  set 
can  be  defined  as  J\  —  {j  \  at,j  =  1,  —  1  }•  Here 

we  introduce  a  set  of  random  numbers  X  =  >•••,  Xn- 1 } > 

and  calculate  the  following  equations. 

Si  =  Xj  (modP)  C1) 

>€  Ji 

9i(j)  =  (mod N)  (0  <  j  <  n  -  1)  (2) 

Finally  the  trusted  center  publishes  {N,n,w , / ,  IDt(l  =  A,B, 
C,..)}  and  delivers  Gt  =  {pi(j)i  0  <  j  <  n  —  1}  to  each  user  l 
though  a  secure  channel  or  by  an  IC  card. 

HI.  Non-Interactive  Key  Sharing 
We  assume  here  that  two  users  A  and  B  want  to  share  a 
common- key  Kab  between  them  non- interactively.  First  A 
calculates  JB  from  IDs  using  /  and  the  Schalkwijk  algorithm, 
and  then  performs  the  following  simple  calculation  to  share  a 
common-key  Kab  with  B. 

kaab  =  n  u{j)  (m°dAr) 

=  (modiV) 

=  qs*Sb  (mod  N)  (3) 


Similarly  B  calculates 

kTb  =  n  3b(j'  (m°div) 

=  qSbSa  (mod  N). 

(4) 

Then  their  shared  common-key  is  given  by 

Kab  =  9SaSb  (mod  N). 

(5) 

IV.  Considerations  on  the  Security 

In  order  to  certify  the  security  of  our  proposed  IDNIKS  we 
must  show  that  a  common-key  between  any  third  parties  can 
not  be  forged  under  the  following  assumptions  even  if  the  col¬ 
lusion  among  users  would  be  allowed. 

Assumptions 

Al.  Factoring  of  IV  =  PQ  is  too  difficult  to  execute. 

A2 .  Any  set  of  less  than  or  equal  to  n  vectors  is  linearly 
independent. 

The  possible  strategies  to  forge  a  common-key  between  any 
third  parties  X  and  Y  are  only  the  following  two  attacks. 

Attack  1:  to  solve  in  zij  =  gx'x>  (mod  N)  (0  <  ij  < 
n-1)  the  simultaneous  equations  given  by  G\  or  II u  gathered 
by  user’s  collusion,  and  to  construct  a  desired  common-key 
between  X  and  Y  using  their  index  sets  Jx  and  Jy . 

Attack  2:  to  gather  many  gi(j)  or  K\y  and  forge  a  user  X’s 
secret  gx(j)  or  a  common- key  II xy  by  replacing  the  exponent 
part  using  a  linear  combination  vx  =  +  bv B  +  cvc 

(mod  L),  where  a,  6,  c,...  are  integer  coefficients.  For  example, 
II xy  seems  to  be  forged  by 

IIxy  =  KZyKbyKcy  •.  (modA).  (6) 

In  the  process  of  executing  the  attack  1,  we  are  inevitably 
confronted  with  solving  an  equation 

=  C  (mod N)  or  zfj  =  C  (mod N),  (7) 

where  C  is  some  known  factor.  However,  we  can  not  calculate 
w-th  root  of  C  because  the  inverse  element  (mod  L)  of  w  is 
unknown  under  the  assumption  Al.  Such  a  situation  is  the 
same  as  that  of  RSA  public-key  cryptosystem. 

In  order  to  obtain  the  integer  coefficients  a,b,cy..  for  exe¬ 
cution  of  the  attack  2,  we  must  solve  an  equation  (a,  6,  c,  ..)V 
=  VX  (mod  L),  where  V  is  an  n  X  n  matrix  of  which  each  row 
vector  is  v*.  However,  it  is  impossible  to  solve  it  in  (a,6,c,...) 
because,  from  the  assumption  A2,  \V\  is  always  equal  to  w 
or  -w}  of  which  inverse  element  (mod  L)  can  not  be  obtained 
because  of  the  assumption  Al.  Such  a  situation  is  the  same 
as  that  of  RSA  public-key  cryptosystem. 
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It  is  important  in  cryptology  to  study  the  degenera¬ 
tion  of  multi  -  valued  logical  functions  (MVLF).  The 
main  purpose  of  this  article  is ,  with  the  help  of  Chresten- 
son  spectrum  ,  to  reveal  the  relationship  between  the  de¬ 
generation  of  MVLF  and  its  linear  structures,  and  to 
characterize  the  property  of  these  linear  structures.  The 
discussion  here  after  is  restrained  to  the  prime  field 
GF'ip)  . 

Definition  1,  Assume  MVLF  f:  GF'ip )  -► 
GF  ip)  ,  the  Ckrestenson  transform-  and  reverse  transform 
are 

Sf(co )  =  p~'  ^  «/c*)  uw'x) 

and 

fix)  ~  log* (  2  /S'/(<u)«<“*ir>) 

respectively,  where  a  =  exp|  ^  ~  ij  ,  (<o,x  >  denotes 

the  inner  production  of  vetor  co  and  x  ,  and  uia'x)  the  con¬ 
jugate  of  u <w,x>  . 

In  the  following  description ,  ©  means  module  p  ad¬ 
dition,  and  +  the  ordinary  addition.  And  /  is  a  MVLF 
such  that  GF'ip)  —  GF  ip)  . 

Definition  2;  /  (z  )  is  said  to  be  degenerate  if 
there  exists  a  A;  X  n  (k  <<  n)  matrix  D  and  MVLF 
<?(y)  over  GFk  Cp  )  such  that  /  O  )  =  gCDz )  = 
<}  Oj  )  ,  V  a?  €:  GF%  (p  )  ,  where  y  =  Dz  . 

Definition  3;  Let  IT;  =  {x  £•  GF'ip )  |/ (z  )  = 

*  0  ^  i  ^  V  —  1  *  and  |JTj|  be  the  number  of  the  ele¬ 
ments  in  IT,-.  /  (x  )  is  said  to  be  balanced  if  \W 0 1  =  | PF x  | 

=  -  =  l^,-i  I  • 

Definition  4;  a  £  GF'ip )  is  refered  to  as  a  Aw/ear 
structure  of  f  if  /  (z  ©  a)  —  /  (a;  )  —  constant (=  /  (a)  — 

/(o>),  v  a  e  <?*"■(? ) . 

Let  F/  be  the  set  of  all  linear  sturctures.  An  imme¬ 
diate  conclusion  from  the  definition  is  that  U s  is  a  linear 
subspace  of  GF'ip  )  .  Let  U  {  =  (of  GF'ip  )  |/  Or  ®  «) 
“/U)  =  i,V^  GGfCy)},  0<*<P-1.  The  ele¬ 
ments  in  U ,  are  refered  to  as  the  class  of  linear  struc¬ 
ture.  Obviously,  the  difference  between  any  two  points 
in  Z7  j  ( 1  ^  i  ^  /?  —  1 )  belongs  to  U  0  .  Hence ,  if  U  <  ^  <f>  , 
then  7/  j  =  /?  +  U  o  .  So  U s  is  the  union  of  V  0  and  some  of 
its  cosets. 


Theorem  1  •  Let  F  =  <  {o>  (cd)  ^  0}  >  ,  dim  I 
=  k  ,  and  H  =  [hx  ,h2 ,  •••  ,A*]T  ,  Where  hx  9h2 ,  •••  ,hk  be 
group  of  bases  of  V .  Then  there  exist  functions  with 
variables  g  iy) :  GFkip )  ->  GF  ip)  ,  such  that  g  iy)  = 
g(Hx)=f(x). 

Theorem  2  ;  dm  <  {o>  J ^  (o>)  ^  0}  >  =  k  if  and  onl; 
if  /  (s  )  degenerates  into  a  function  with  at  most  k  vari 
ables. 

Theorem  3  :  a  £  U t  if  and  only  if  (co)  =  0 ,  V  < 
6  GF'ip),  < g>,« >  7^  i  j  where  Z7,  is  set  of  the  ith  clas 
linear  structure  of  /  (x  )  . 

Theorem  4  :  U  0  —  {a\H  a  =  0}  —  <  { o>  |  ^  (a>)  ? 
0})1. 

By  Theorem  2  and  Theorem  4  it  is  known  that  th 
0th  class  linear  structure  virtually  characterizes  the  de 
gree  of  degeneration  of  function  fix')  .  The  function  i 
degenerated  whenever U {0}  .  Meanwhile,  it  is  point 
ed  out  that  U  0  =  ( {co  |  Sf  (a>)  ^  0}  >-**  . 

Corollary:  dim  U f  =  n  if  and  only  if  /  is  a  linea 
function ,  i.  e.  ,  /  (a; )  =  cxx  i  +  c2x2  +  •••  +  cnzn  +  c0 
where  c<  £  GF  ip  )  . 

Theorem  5;  If  fix)  has  ith  if  0)  class  linea 
sturcture  then  (1)  fix )  must  has  other  classes  linea 
structures ;  (2)  /  ix  )  is  balanced. 

By  Theorem  5,  if  U  f  ^  U 0  ,  then  ail  classes  of  linea 
sturcures  exist.  Once  a  lstf  class  linear  structure  a  i 
found  ,  Aa(0  ^  k  ^  p  —  1)  is  a  kth  class  linear  sturcture 
Thus,  if  U  o  and  one  a  £  U  j  is  determined ,  all  U  f  can  b 
determined. 
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Abstract  —  Many  cryptographic  protocols  depend  on  one  and  only 
problem,  the  one  of  factoring.  This  paper  presents  a  new  identification 
scheme  whose  security  depends  on  an  NP-complete  problem  from  the 
theory  of  error  correcting  codes  :  the  syndrome  decoding  problem.  The 
computation  complexity  of  the  proposed  scheme  is  smaller  than  those  of 
the  other  schemes  based  on  SD  problem.  Moreover  the  amount  of  memory 
needed  by  the  prover  is  very  small. 

I.  Introduction 

We  define,  in  this  paper,  a  new  identification  scheme  based  on  the 
syndrome  decoding  problem  [1].  The  decision  problem  of  the  SD 
problem  (stated  in  terms  of  generator  matrix)  is  the  following  : 

Let  G(k,n)  be  a  generator  matrix  of  a  random  linear  binary  code.  Let 
p  be  an  integer  and  x  be  a  random  binary  vector  of  length  n.  Does 
there  exist  a  word  e  of  length  n  and  weight  p  such  that  x  +  e  belongs  to 
the  code  generated  by  G.  Thus  the  problem  is  to  know  if  there  exists 
a  couple  (m,  e)  such  that  x  =mG+e  where  e  is  a  word  of  weight  p. 
We  will  use  the  following  definitions  : 

.  Let  =  {fi\, . . . ,  fa)  be  a  basis  of  F2k  and  y  =  Kf  A  t>e  an 
arbitrary  element  of  F2*,  the  ^-weight  of  y ,  cop(y),  is  defined  as  the 
Hamming  weight  of  (y\ . /*)• 

.  Let  y  be  an  arbitrary  element  of  F2k.  The  ^-product  matrix  of  y, 
[y]p,  is  defined  as  the  Kronecker  product  ft  <g>  y  where,  for  1  <i  <k, 
(P  <8>  y ),  is  considered  as  a  row  of  k  elements  over  F2,  and  ft  denotes 
the  transpose  of  the  row  vector  ft.  Thus  [y]p  is  a  k  x  k  invertible 
binary  matrix. 

.  Let  y  be  a  fixed  element  of  F2k.  Let  p  =  pi  ft  be  an  arbitrary 
element  of  F2k  then  :  py  =  (pi, . . . ,  Pk)ly] fi¬ 
ll.  The  Identification  Scheme 

Notations 

.  From  now  on,  a  binary  vector  of  length  k  will  be  considered  as  an 
element  of  F2*  if  needed  (and  vice  versa), 

.  Let  y  be  a  vector  of  length  n  and  cr  be  a  permutation  over  { 1 , 
then  yo  is  defined  as  the  vector  z  such  that  zj  —  )Vo>  Likewise,  if 
M  is  an  m  x  n  matrix  then  Mo  =  (m/>(j/)), 

.  <  x  >  denotes  the  action  of  a  hash  function  over  the  string  x , 

.  A  vector  *  of  length  2k  will  be  represented  by  the  couple  {xux2) 
where  jci  and  Jt2  are  vectors  of  length  k.  o 

Let  P\  =  {1,  a, . . . ,  a*-1}  be  a  basis  of  F# ,  ft  be  a  basis  of  F2 »  and  £2* 
be  its  dual  trace  basis.  A  certification  center  C,  having  the  confidence 
of  all  users,  computes  two  random  elements  yi  and  y2  of  F2k.  Let  S  be 

equal  to  ^  ^  ^  and  G'  be  a  random  kx2k  binary  matrix  of  rank  k.  C 

computes  G  =  SG'.  This  matrix  is  common  to  all  users.  S  and  G'  are 
no  longer  needed  and  are  unknown  to  all  users.  Finally,  C  computes 
for  each  user  :  a  random  binary  vector  w  —  (wi,  u2)  of  length  2k, 
which  verifies  uS  =  0,  and  the  matrix  G  =  [p]p*(G  \  Q)tc2,  where 
it 2  is  a  random  permutation  of  {1,  . . . ,  4 k),  and  Q  is  a  random  (2k,  2k) 
matrix. 

Secret  quantities  of  each  user  are :  n2~\  m  a  binary  word  of  length 
2k,  e  a  binary  word  of  length  2k  and  weight  p,  u,  mG  and  p~l . 
Public  data  of  each  user  are  :  ak,  G,  x  =  mG  +  e  and  p. 

Suppose  that  A  wants  to  prove  its  identity  to  B.  The  protocol 
includes  r  rounds,  each  of  these  being  performed  as  follows : 
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•  A  computes  a  random  element  p  of  F2k  and  v  a  random  vector  of 
length  2k.  Let  w  =  u  +  v,  A  computes  y  —  (pv i,  p v2)  and  sends  to 
B  the  quantity  yp”1, 

•  B  sends  back  z  =  (yp~l)G, 

•  A  randomly  computes  :  a  permutation  a  of  {1,  . . . ,  2k]  and  zrt2. 
The  first  2k  bits  of  this  vector  are  equal  to  (pw{,  r)w2)G  (since  uS  =  0). 
Letr  =  (r}W\,  r]w2),  A  sends  to  B  :  c\  —<  o  >,c2  =<  (r+m)Ga  >, 
c3  =<  (rG  +  x)o  > 

•  B  sends  a  random  element  e  of  (0,  1,  2), 

•  If  e  is  0,  A  discloses  r  +  m  and  o.  B  checks  the  validity  of  cx  and 

•  If  e  is  1,  A  discloses  (r  +  m)Go  and  eo .  B  checks  the  validity  of 
c2  and  c3  and  verifies  that  co(ecr)  —  p, 

•  If  e  is  2,  A  discloses  t  and  cr.  B  checks  the  validity  of  cx  and  c3. 
The  security  of  the  scheme  is  linked  to  the  values  of  the  parameters 
k,  p,  (op*  (p)  and  r. 

HI.  Security  of  the  scheme 
According  to  [2],  minimal  parameters  which  guarantee  the  security 
of  the  scheme  are  :  k  —  255,  p  —  56,  cop*(p)  =  20  and  r  =  35. 
The  complexity  of  the  various  attacks  is  then,  at  least,  270,  and  the 
probability  of  success  of  the  different  frauds  is  about  10“6. 

It  can  be  shown  that  repetition  of  the  protocol  is  a  “proof  of  knowl¬ 
edge”  of  a  solution  of  the  system  x  =  mG  +  e ,  co(e)  =  p.  Moreover 
we  believe  that  this  scheme  is  computationally  zero-knowledge. 

IV.  Performances  of  the  scheme 
The  prover  do  not  have  to  store  the  matrix  G  or  the  matrix  G 
since  he  doesn’t  execute  any  computation  with  these  matrices.  Thus, 
the  latter  needs  a  very  little  amount  of  memory.  Using  the  basis  p\ 
allows  the  prover  to  compute  yp~l  without  storing  p”1 .  Moreover  the 
complexity  of  the  computations  done  by  the  prover  can  be  reduced  by 
using  an  irreducible  trinomial  [5]  so  as  to  generate  F2k.  To  show  the 
efficiency  of  the  scheme,  we  have  compared  it  with  Stem’s  scheme  [4], 
which  among  the  schemes  based  on  SD  problem  is  the  most  practical. 


Stern’s  scheme 

Our  scheme 

Global  transmission  rate 

-40133  bits 

-  75740  bits 

ROM 

66048  bits 

2415  bits 

Prover’s  workfactor 

-  222  13 

<  21" 
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sides  in  the  MAP  decoder).  The  decoder  is  implemented  using 
a  modified  version  of  the  Viterbi  algorithm. 


Figure  1.  Received  two-tone  Lena 


Figure  2.  Decoded  two-tone  Lena 


Extended  Abstract 

We  consider  an  alternate  approach  to  coding  information 
bearing  data  for  the  reliable  transmission  of  two-tone  images 
over  noisy  communication  channels  with  memory.  This  con¬ 
sists  of  jointly  designing  the  source  and  channel  codes  (a  tech¬ 
nique  referred  to  as  joint  source-channel  coding). 

Source  and  channel  coding  are  two  problems  that  have 
traditionally  been  implemented  separately,  forming  what  is 
known  as  a  tandem  source-channel  coding  system.  The  sep¬ 
aration  of  channel  and  source  coding  is  only  optimal  in  an 
asymptotic  sense,  i.e.,  when  no  constraints  exist  on  the  cod¬ 
ing  block  lengths  (delay)  and  on  the  complexity  of  the  en¬ 
coder/decoder  [1].  Joint  source-channel  coding,  however,  has 
recently  received  increased  attention.  It  has  been  shown  that 
if  delay  and  complexity  are  constrained,  performance  can  be 
increased  if  the  source  and  channel  codes  are  jointly  designed, 
as  opposed  to  being  treated  independently  [2,  3]. 

In  this  work,  we  propose  joint  source-channel  coding 
schemes  for  the  reliable  transmission  of  two-tone  images  over 
a  binary  channel  with  additive  Markov  noise.  Applications  of 
this  work  are  in  the  transmission  of  facsimile  documents  over 
land  mobile  radio  channels. 

We  model  the  image  as  a  one-dimensional  non-uniform  bi¬ 
nary  iid,  a  Markov  process  or  as  a  two-dimensional  causal 
Markov  process.  We  then  investigate  the  problem  of  the  max¬ 
imum  a  posteriori  probability  (MAP)  detection  of  binary  im¬ 
ages  directly  transmitted  over  the  Markov  channel.  The  ob¬ 
jective  is  to  design  a  MAP  detector  that  fully  exploits  the 
redundancy  of  binary  images  to  combat  channel  noise.  It  will 
also  exploit  the  larger  capacity  of  the  channel  with  memory  as 
opposed  to  the  interleaved  (memoryless)  channel.  Since  this  is 
a  model-based  decoding  algorithm,  we  assume  that  the  image 
parameters  are  provided  to  the  decoder  (this  can  be  achieved 
by  transmitting  them  over  the  channel  using  a  forward  error- 
control  code).  We  next  address  the  problem  of  MAP  detec¬ 
tion  of  compressed  binary  images  directly  transmitted  over 
the  Markov  channel.  Comparisons  of  the  performance  of  the 
above  coding  schemes  with  traditional  tandem  schemes  (that 
use  Run- length  and  Huffman  coding  for  source  coding,  and 
convolutional  codes  and  interleaving  for  channel  coding),  are 
also  presented. 

Simulation  results  for  the  transmission  and  detection  of  an 
uncompressed  two-tone  image  of  Lena  are  displayed  in  Figures 
1  and  2.  In  this  experiment,  the  Markov  channel  bit  error  rate 
is  Pr(Zn  —  1)  ==  e  =  0-1  and  the  noise  correlation  parameter 
is  S  —  10.0  (the  corresponding  noise  correlation  coefficient  is 
y—j).  These  parameters  correspond  to  a  very  noisy  channel 
with  high  noise  correlation.  The  resulting  average  decoding 
bit  error  probability  is  0.02.  This  result  is  very  promising 
given  the  low  complexity  of  the  system  (which  primarily  re- 
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Abstract  -  A  hierarchical  scheme  for  chain  encoded  digital 
contours  is  introduced.  If  the  contour  represents  the 
boundary  of  a  binary  image,  a  true  digitization  of  the  image  is 
realized  as  the  pyramid  structure  goes  from  the  fine  to  coarser 
resolution  levels.  A  progressive  transmission  system  is 
designed  to  go  from  a  coarse  resolutin  level  to  finer  levels 
which  uses  essentially  the  same  number  of  bits  as 
transmission  of  the  contour  at  the  finest  resolution. 

SUMMARY 

We  consider  a  pyramidal  structure  for  digital  binary  images 
which  lends  itself  to  efficient  multiresolution  transmission. 
Binary  images  of  objects  are  digitized  by  coloring  a  pixel  black 
if  the  center  point  of  the  pixel  cell  is  within  the  object. 
Otherwise,  it  is  white. 

Quadtrees  are  commonly  used  to  create  multiresolution 
structures.  In  this  method  the  pixel  cells  which  intersect  the 
boundary  of  the  object  are  designated  as  gray  cells.  It  is  only 
these  cells  that  need  to  be  subdivided  to  obtain  a  finer  level  of 
representation.  Grey  cells  (or  nodes)  of  the  quadtree  are 
designated  as  either  gray-colored  black  or  gray-colored 
white.  Consequently,  node  designations  are  of  four  types, 
white,  black,  gray-white  and  gray-black.  Thus,  for  a 
heirarchial  representation  using  quadtrees,  8  bits  are  used  to 
describe  the  four  higher  resolution  children  cells  of  each 
coarse  resolution  gray  cell. 

A  second  approach  for  the  encoding  of  binary  images  is  to 
use  chain  codes  to  follow  the  boundary  of  the  digitized 
object.  This  consists  of  using  4-directional  links  that  follow 
the  edges  or  "cracks”  of  the  pixel  cells  and  hence  is  sometimes 
referred  to  as  crack  codes. 

The  crack  code  requires  2  bits  per  link  and  on  average  the 
number  of  links  equals  the  number  of  gray  cells  for 
quadtrees.  For  contour  following  codes  the  number  of  links 
double  with  each  level  of  increased  resolution.  Thus  the 
"brute  force"  method  of  simply  transmitting  the  full  crack 
code  at  each  level  of  resolution  uses  4  bits  per  coarse  link,  half 
that  for  quadtrees. 

A  pyramidal  structure  for  the  crack  code  which  makes  use  of 
the  coarse  information  while  transmitting  information  for  the 
finer  resolution  could  give  further  improvement.  However, 
one  finds  that  for  digital  binary  images  the  content  in  the  fine 
resolution  is  not  sufficient  to  determine  the  coarse  resolution 
image.  This  is  due  to  the  fact  that  as  4  small  adjacent  pixel 
cells  coalesce  into  a  coarser  cell,  knowledge  of  whether  the 
center  point  of  the  coarser  cell  is  within  the  object  is  not  given 
by  the  colors  of  the  smaller  cells. 

Conversely,  this  implies  that  when  going  from  coarse  to  fine, 
a  portion  of  the  coarse  information  is  not  relevent  when 
providing  additional  information  for  the  fine  resolution 
image.  Thus  for  an  efficient  multiresolution  system  one 


should  search  for  a  structure  whereby  the  course  information 
is  contained  by  the  finer.  This  can  be  achieved  by  shifting  (in 
each  direction)  the  pixel  cells,  or  equivalently  the  image,  by 
1  /2  the  value  of  the  side  of  a  cell.  In  this  way  the  center  point 
of  a  coarse  cell  coincides  with  that  of  a  smaller  cell. 

We  construct  such  a  multiresolution  structure  for  contour 
following  chain  codes.  The  total  number  of  bits  required  for 
transmission  if  sent  progressively  is  shown  to  be  essentially 
the  same  as  that  for  transmission  only  at  the  final  level  of 
resolution.  Thus,  in  a  sense,  no  coding  inefficiency  is  created 
by  the  multiresolution  structure  and  the  scheme  makes  full 
use  of  the  coarse  data.  Each  level  of  resolution  requires  2  bits 
per  coarse  link  or  coarse  cell  as  opposed  to  8  bits  for 
quadtrees. 
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Abstract  —  An  efficient  coding  method  using  a 
three-dimensional  discrete  cosine  transform  (DCT) 
for  still  images  is  presented.  This  is  an  extended 
version  of  the  traditional  DCT  coding  method.  The 
adaptive  application  of  the  three-dimensional  DCT  to 
each  sub-block  makes  the  coding  more  efficient  than 
the  other  DCT  methods. 

I.  Introduction 

In  Near  future  an  information-superhighway  will  be  working 
all  over  the  world.  In  such  a  situation  an  image  coding  method 
was  standardized  by  the  international  body  called  Joint  Pho¬ 
tographic  Expert  Group  (JPEG)  [1],  And  still  now  an  efficient 
and  hne  image  coding  technique  is  required  as  urgently  as 
ever.  In  this  work  using  three-dimensional  DCT  we  demon¬ 
strate  that  a  more  attractive  image  coding  method  for  still 
images  can  be  made. 

II.  Image  Coding 

Traditional  DCT  image  coding  is  done  by  sectioning  the  full 
picture  into  tiny  sub- blocks  separately.  The  block  size  is  usu¬ 
ally  taken  an  8~32  pels.  In  such  a  tiny  size  there  exist  strong 
correlations  between  the  block  and  its  neighbor  block.  Hence 
it  is  considered  that  there  exists  redundancy  in  taking  a  trans¬ 
form  coding  for  each  sub-block  independently.  To  remove  such 
redundancy,  we  adopt  a  three-dimensional  DCT  for  the  differ¬ 
ence  between  sub-blocks.  First  we  take  nine  sub-blocks  (3x3 
blocks)  as  a  unit  as  shown  in  Fig.  1.  Each  sub-block  is  square 
with  8  pels.  The  nine  sub- blocks  are  ordered  as  in  Fig.  1. 
After  each  sub-block  is  transformed  by  two  dimensional  DCT 
the  DCT  coefficients  of  the  Ot.h  sub-block  are  subtracted  from 
those  of  other  sub-blocks. 


i <pp% 


Fig.  1:  Sub-block  and  its  ordering 


That  is  to  say,  with  Di(iJ)  the  (ij) th  DCT  coeffi¬ 
cient  of  the  /th  sub-block  we  calculate  differences  — 

Do(i,j)(i,j  =  1,  2,  ...8)  for  any  /(/  =  1,  2,  ...8)  and  set  these  dif¬ 
ferences  in  the  /th  sub-block.  Next  using  these  sub-blocks  we 
make  a  cubic  structure  as  in  Fig.  2.  And  the  one-dimensional 


DCT  is  utilized  again  in  the  depth  direction  for  the  cubic 
structure. 


Fig.  2:  Cubic  structure 


If  the  unit  includes  the  clear  edges  of  the  image  then 
the  traditional  coding  method  (JPEG  method)  is  applied  for 
each  sub-block  of  the  unit  because  the  separate  processing  for 
each  sub-block  is  preferable  in  such  a  case.  Hence,  the  pro¬ 
posed  method  is  a  hybrid  type  of  two-dimensional  and  three- 
dimensional  DCTs.  The  total  bit  rate  is  determined  from  the 
rate  distortion  function  and  its  quantization  level  from  the 
Max’s  theory.  For  simplicity,  however,  using  a  constant  re¬ 
duction  area  and  the  equidistant  quantization  levels  we  can 
make  an  easier  coding  method  which  is  still  effective. 

III.  Simulation 

The  reconstructed  image  by  using  this  method  in  1  bit/pel 
have  been  of  the  SNR  29dB~33dB.  The  image  qualities  are 
all  good  and  the  block  noise  does  not  appear  in  these  images. 

IV.  Conclusions 

Using  three-dimensional  DCT  we  have  constructed  an  efficient 
transform  image  coding  method.  It  is  considered  that  this 
method  can  play  an  important  role  for  quick  transmitting  of 
still  images. 
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Abstract  —  We  address  the  problem  of  data 
compression  and  transmission  applied  to 
images.  We  present  a  differential  pulse  coded 
modulation  (DPCM)  scheme  whose  prediction 
filter  is  the  Li  filter.  We  also  present  an  ECC 
decoder  which  takes  advantage  of  side 
information  to  selectively  filter  the 
reconstructed  prediction  difference  sequence, 
using  the  weighted  median  filter.  We  apply  our 
schemes  to  some  test  images,  and  we  compare 
their  performance  to  the  appropriate  baseline 
schemes.  Our  techniques  exhibit  a  lot  of 
resilience  to  noise,  even  for  very  noisy 
channels. 

I.  Introduction 

DPCM  is  a  well  known  technique  for  the  compression  of 
correlated  sources,  and  has  been  widely  used  in  the 
compression  of  speech,  images,  and  video.  It  is  simple  and 
remarkably  effective.  DPCM  is  appropriate  in  applications 
where  the  hardware  costs  need  to  be  kept  low.  Examples 
include  many  telephone  systems  and  personal  wireless 
communication  systems.  It  is  also  appropriate  in 
applications  such  as  digital  television,  where  the  large  data 
rate  requires  state  of  the  art  high  speed  electronics,  which 
may  prohibit  the  use  of  complex  compression  schemes. 

Because  of  its  differential  nature,  DPCM  can  suffer 
from  acute  sensitivity  to  bit  errors.  That  is,  a  single  bit 
error  can  affect  many  reconstructed  samples.  To  mitigate 
this  effect,  in  this  paper  we  propose  to  use  the  Li  filter  of 
Palmieri  and  Boncelet.  This  filter  is  a  good  predictor,  and  it 
also  significantly  reduces  DPCM's  sensitivity  to  bit  errors. 

We  also  present  a  modified  ECC  decoder  which  takes 
advantage  of  side  information  to  selectively  filter  the 
reconstructed  prediction  difference  sequence.  This  decoder 
uses  the  weighted  median  filter.  This  modified  decoder 
further  enhances  the  scheme’s  resilience  to  channel  errors. 

II.  The  Coding  Scheme 

We  propose  the  Li  filter  as  a  predictor  in  the  DPCM 
feedback  loop.  Our  filter  is  given  by 

wtJ  =  ah j  V 2J  +  Vw-l 

+  o Mj  V \j  +  °W-2  %j- 2  +  aD-i  V l 

where  the  coefficients  a"  depend  on  the  ranking  of  the 
elements  u .  We  consider  a  special  case  of  Li  filter,  where 


the  only  ranking  information  used  consists  of  the  location 
of  the  largest  and  smallest  u,  which  we  discard  by  making 
their  coefficients  equal  to  zero.  For  the  remaining  three 
elements,  we  choose  the  filter  that  minimizes  the  MSE  in 
the  absence  of  a  quantizer.  One  can  see  that  our  choice  of  Li 
filter  behaves  like  both  a  linear  filter  and  a  median  filter. 
Its  linearity  makes  it  a  good  predictor.  Its  nonlinearity 
gives  noise  immunity  to  the  DPCM  decoder,  because  it 
provides  a  mechanism  for  discarding  outliers. 

We  also  propose  a  modified  ECC  decoder  that  takes 
advantage  of  two  sources  of  additional  information.  The 
first  is  the  residual  redundancy  in  the  prediction  difference 
v.  Generally  speaking,  v  is  not  highly  correlated,  but  it 
retains  enough  local  correlation  to  help  the  decoder.  The 
decoder  takes  advantage  of  this  by  filtering  the  prediction 
difference  estimate  v’.  The  filter  will  be  applied 
selectively,  only  when  the  decoder  has  a  low  confidence  in 
its  output.  Our  choice  of  filter  is  the  center  weighted  median 
filter. 

The  second  source  of  additional  information  comes 
from  the  ECC  decoder,  which  can  produce  an  estimate  for 
the  error  pattern  e  introduced  by  the  noisy  channel.  If  the 
weight  of  e  is  zero,  the  decoder  is  very  confident  in  its 
decision.  As  the  weight  increases,  it  becomes  less  and  less 
confident.  We  set  a  threshold  x  >  0,  and  let  the  decoder 
enable  the  filter  each  time  the  weight  exceeds  x. 

III.  Examples 

We  apply  our  schemes  to  some  test  images.  We  compare 
the  compression  of  DPCM  with  an  Li  filter  (L^-DPCM)  and 
its  behavior  in  the  presence  of  bit  errors  to  that  of  two 
baseline  schemes:  DPCM  with  a  linear  filter  (^-DPCM)  and 
DPCM  with  a  median  filter  (M-DPCM).  In  terms  of 
quantization,  the  mean  squared  error  (MSE)  performance  of 
L^-DPCM  is  better  than  M-DPCM  and  worse  than  f-DPCM, 
but  all  three  are  good.  Visually,  the  three  schemes  are 
essentially  identical  for  quantizers  with  3  bits  and  above. 

In  terms  of  channel  response,  without  ECC,  and  with 
ECC  and  a  standard  decoder,  L-^-DPCM  sometimes  beats  £- 
DPCM  in  MSE,  and  sometimes  not,  while  M-DPCM  is  a 
distant  third.  Visually,  L^-DPCM  does  the  best  job  of 
concealing  distortion,  since  it  suffers  the  least  from  the 
very  noticeable  streaks  typical  of  linear  DPCM.  Using  ECC 
with  the  modified  decoder  helps  all  three  DPCM  schemes  in 
MSE,  with  ^-DPCM  benefiting  the  most,  and  M-DPCM  the 
least,  because  the  median  filter  is  not  a  particularly  good 
predictor.  Again,  visually,  L^-DPCM  wins. 
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A  fundamental  problem  in  two-dimensional  signal  pro¬ 
cessing  is  the  modeling  and  analysis  of  nonhomogeneous  two- 
dimensional  (2-D)  signals.  For  example,  in  almost  any  image 
taken  by  a  camera,  perspective  exists,  and  hence  the  acquired 
2-D  signal  is  nonhomogeneous,  even  if  the  original  scene  was 
homogeneous.  Conventional  approaches  to  the  problems  of 
perspective  and  camera  orientation  estimation  usually  involve 
local  analysis  of  the  image,  by  means  of  edge  detection  algo¬ 
rithms.  Parametric  models,  when  used  in  image  processing, 
generally  assume  the  observed  image  to  be  homogeneous,  or 
piece-wise  homogeneous.  In  this  paper  we  consider  a  paramet¬ 
ric  model  which  is  nonhomogeneous ,  and  attempts  to  perform 
global  (or  at  least,  less  localized)  image  analysis.  We  will 
study  a  model  consisting  of  a  sine  (or  cosine)  of  a  polynomial 
function  of  the  image  coordinates. 

For  practical  reasons  it  is  more  convenient  to  work  with  a 
complex  valued  model  in  which  the  sinusoidal  function  is  re¬ 
placed  by  a  complex  exponential.  In  applications  where  the  2- 
D  signal  is  real,  it  can  be  converted  subject  to  some  restrictive 
conditions,  into  complex  form  through  the  Hilbert  Transform. 
Throughout  this  paper  we  will  consider  2-D  signals  which  can 
be  represented  by  a  constant  amplitude  complex  exponential 
whose  phase  is  a  polynomial  function  of  the  coordinates. 

Let  {77(77,777)}  be  the  2-D  field  which  is  given  by 
v(n,m)  =  A  exp{j<f>s+i  (n,  m)} 

<f>s+i(n,m)  =  ^2  c(k,£)nkml  ,  (1) 

(*.<)€/ 

where  I  =  {0  <  k,£  and  0  <  k  +  £  <  S  +  1}.  We  shall  call 
<f>s(n,  m)  a  2-D  polynomial  of  total-degree  5.  In  other  words, 
one  might  think  of  the  phase  polynomial  <f>s(n,  m),  as  if  it  has 
S  ‘layers’  since  increasing  5  by  one  adds  a  ‘layer5  of  additional 
5  +  2  parameters  to  the  phase  model. 

Definition  1:  Let  rm  and  rn  be  some  positive  constants. 
Define 


Let  PDs[v(n,m)]  be  the  2-D  signal  obtained  by  succes¬ 
sively  applying  in  some  arbitrary  sequence,  P  times  the  oper¬ 
ator  PDn(1)[.],  and  5  — P  times  the  operator  PDm(i)[-],  to  the 
signal  (1).  Then,  PDs[u(n,m)]  is  the  2-D  exponential 

PD+(n,  m)]  =  exp  j  j[usn  +  vsm  +  7 s(t„,  rm)]  j  ,  (4) 

whose  spatial  frequencies  are  given  by  us  =  (-~l)5c(P+l,  5  — 
P)(P+l)!(5-P)!rrfr+'p  ,  i/s  =  (~l)sc(P,S+l-P)P\(S  + 
1  —  P)  •  rn  7~m  f  and  7 s(rn,  Tm)  is  neither  a  function  of  m  nor 

of  77. 

We  can  thus  reduce  any  2-D  nonhomogeneous,  polynomial 
phase  signal,  v{n,  m),  whose  phase  is  of  total  degree  5+1,  to  a 
2-D  single  tone  whose  frequency  is  (us,  t's).  Hence,  estimating 
(ws,vs)  using  any  standard  frequency  estimation  technique, 
results  in  an  estimate  of  c(P  +  1,  5  -  P),  and  c(P,  5  +  1  -  P). 
At  present  we  estimate  the  frequency  of  the  exponential  using 
a  search  for  the  maximum  of  the  absolute  value  of  the  signal 
2-D  Discrete  Fourier  Transform.  We  have  thus  obtained  an 
estimate  of  two  of  the  parameters  of  the  highest  order  ‘layer5, 
5  +  1,  of  the  phase  model  parameters(  i.e.,  those  c(Jk,£)5s  for 
which  0  <  k,£  :  k  +  £  =  5  +  1).  However,  the  highest  order 
‘layer5,  5  +  1,  of  the  phase  model  parameters  has  5  +  2  param¬ 
eters,  which  need  to  be  estimated.  This  can  be  achieved  by 
repeating  the  procedure  which  was  described  above  assuming 
some  arbitrary  P,  for  all  P  such  that  0  <  P  <  5. 

Multiplying  v(n,  m)  by  exp{-j  c(k,  S  +  1  - 

fc)ms+1_fcn*}  results  in  a  new  polynomial  phase  signal  whose 
total  degree  is  5.  By  applying  to  the  resulting  signal  a  proce¬ 
dure  similar  to  the  one  used  to  estimate  the  parameters  c(k ,  £) 
for  k-\-£  =  5+1,  we  obtain  an  estimate  of  the  5+1  parameters 
in  the  5  ‘layer5.  Let  777)  denote  the  2-D  signal,  where 

s  +  1  denotes  the  current  total-degree  of  its  phase  polynomial. 
By  repeating  for  all  s  =  5, . . . ,  0,  the  two  basic  steps  of  esti¬ 
mating  the  c(k,  £)  parameters  of  ‘layer5  s  +  1  through  finding 


Nn>  m)]  ^PDm(9-l)  lV(n>  m  +  ^m)]^  , 

7i  —  0,1,...,  N  1  ,  m  =  0, 1, . . . ,  M  —  1  —  qr-m 


(2) 


the  maxima  of 


DFT  PD 


m(s  —  P) 


PDn(P>[u(s+1+,m)] 


for  all  0  <  P  <  s,  followed  by  multiplying  the  already  reduced 
order  2-D  polynomial  phase  signal  by  exp{- j  c(k,  s  + 

1  —  &)m5+1  in  next  step,  we  obtain  estimates  for  all 

the  phase  parameters. 


PDn(p)  b(«,m)]  = 

PD„(p-i)  H«,  m)}  [v(n  +  Tn ,  777  )]^  , 

»  =  0, 1, . . . ,  N  -  1  -  pr„  ,  m  =  0, 1, . . . ,  M  -  1  (3) 

where  PDm(0)  [u(n,  m)]  =  PDn(o)[u(n,  m)]  =  v(n,  m ). 


In  many  cases  the  observed  2-D  signal  is  corrupted  by 
additive  white  Gaussian  noise.  In  this  paper  we  derive  the 
exact  Cramer-Rao  Lower  Bound  (CRLB)  on  the  accuracy  of 
estimating  the  model  parameters  in  the  presence  of  additive 
white  Gaussian  noise.  The  performance  of  the  algorithm  is  il¬ 
lustrated  by  numerical  examples,  and  its  performance  is  com¬ 
pared  with  the  Cramer-Rao  bound. 
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Abstract  —  This  paper  presents  a  restoration  algo¬ 
rithm  based  on  a  local  signal  description  using  dis¬ 
crete  polynomials.  The  algorithm  is  made  adaptive 
by  estimating  the  local  signal-to-noise  ratio  and  by 
computing  the  corresponding  deblurring  filter. 

Furthermore,  this  method  is  developed  for  discrete 
signals,  the  input  and  output  images  being  almost  al¬ 
ways  available  as  discrete  signals. 

I.  Introduction 

Methods  to  describe,  restore  and  compress  signals  by 
mean  of  polynomials  have  already  been  developed  by 
Martens  [1,  2]  and  Philips  [4].  The  basic  idea  behind 
these  methods  is  the  computation  of  filters  in  order  to 
estimate  the  polynomial  coefficients  describing  the  ideal 
signal,  starting  from  the  degraded  signal. 

Martens  [2],  applying  these  methods  to  image  restora¬ 
tion  assumes  that  each  sample  of  the  sampled  degraded 
image  corresponds  to  the  zero-order  term  of  the  ideal  im¬ 
age  polynomial  expansion.  This  implies  that  the  blurring 
kernel  is  identical  to  the  squared  local  window  function 
used  to  describe  the  signal. 

In  the  proposed  method,  no  other  assumption  is  made 
about  the  blurring  kernel  than  a  general  low-pass  be¬ 
haviour.  This  allows  the  choice  of  arbitrary-shaped  blur¬ 
ring  functions  and  of  arbitrary  positions  for  the  localisa¬ 
tion  windows. 

II.  Discrete  polynomial  transforms 

This  transform  consists  in  approximating  the  localised 
signal  using  polynomials.  These  polynomials  are  or¬ 
thonormal  with  respect  to  a  window  function  V(i),  i.e. 
they  are  defined  by 

(Gn,  Gm)  =  V2(l)  Gn(i)  Gm(i)  —  <5n,m  (1) 

i 

and  the  coefficients  of  the  polynomial  expansion  are  ob¬ 
tained  in  the  usual  way. 

When  the  localising  function  is  a  binomial,  the  orthog¬ 
onal  polynomials  to  be  used  are  the  Krawtchouk  polyno¬ 
mials. 

The  extension  to  two  dimensions  is  trivial  when  a  sep¬ 
arable  localising  function  is  considered. 

III.  Non  adaptive  restoration 

The  restoration  algorithm  consists  in  computing  the  co¬ 
efficients  of  the  polynomial  expansion  of  the  ideal  signal 
from  its  degraded  version  using  filters. 

*The  authors  can  be  contacted  via  E-mail  at  the  following 
addresses:  Xavier.Neyt@elec.rma.ac.be  and  Marc.Acheroy@elec.- 
rma.ac.be 


Note  that  because  of  the  noise  included  in  the  blurred 
signal,  it  is  not  possible  to  estimate  accurately  the  high 
order  polynomial  coefficients. 

The  filters  to  use  are  obtained  by  minimising  the  mean 
square  error  between  the  unknown  coeffcients  Ln,fc  and 
their  estimate  Ln,fc- 

These  filters  strongly  depends  on  the  local  signal  to 
noise  ratio  in  the  ideal  image,  which  must  be  estimated. 

Selecting  a  constant  value  for  the  signal  to  noise  ra¬ 
tio  yields  permanent  restoration  filters,  hence  resulting 
in  non  adaptive  restoration. 

IV.  Adaptive  image  restoration 
To  make  the  algorithm  adaptive,  the  local  signal  variance 
must  be  estimated  in  each  window  and  the  corresponding 
filters  computed. 

Since  the  SNR  of  the  estimated  coefficients  of  low  order 
is  high,  even  if  the  SNR  of  the  filters  doesn’t  match  that 
of  the  image,  the  coefficients  computed  using  these  filters 
can  be  used  to  get  an  estimate  of  the  local  signal  variance 
hence  yielding  the  signal  to  noise  ratio. 

Having  the  SNR  of  the  ideal  image  in  each  window,  the 
estimation  filters  can  be  computed.  These  filters  applied 
to  the  blurred  image  will  give  estimates  of  the  coefficients 
of  the  ideal  image,  thus  yielding  the  restored  image. 

V.  Conclusions 

A  local  description  is  particularly  well  suited  for  adap¬ 
tive  restoration  methods.  Only  adaptivity  with  respect  to 
the  SNR  has  been  considered  here  but  a  spatially  variant 
blurring  filter  B  could  easily  be  considered  since  no  as¬ 
sumption  has  been  made  about  the  blurring  filter.  More¬ 
over,  due  to  the  property  of  orientation  selectivity  of  this 
kind  of  transform  (when  extended  to  2D),  a  directional 
restoration  could  also  be  implemented. 

Note  finally  that  this  local  description  enables  the  easy 
parallelisation  of  the  algorithms. 
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Abstract  -  In  this  paper,  we  use  the  extreme  elements  to 
obtain  size  and  shape  information  of  random  structures. 

I.-  Introduction. 

In  textural  analysis,  Mathematical  Morphology  M.M.  [l]use  the 
probabilistic  models  and  the  Granulometry  to  obtain  size  and 
shape  information.  On  the  other  hand,  fractal  objets  are 
characterized  by  applying  morphological  dilations.  Here,  we 
proppose  an  approach  working  with  the  extreme  elements  of  a 
Granulometry  and  of  the  Eroded. 


II. -Morphological  transformations 
In  M.M.the  basic  morphological  transformations,  the  Dilation  and 
Erosion  by  B  (structuring  element),  are  given  by, 

Sfi(X)  =  X®S=u(x6:  b  efij 

Efi(X)  =  XQB=  n  {x*  :  b  efl}  ;  fl=  {-x  :  x  e  B) 

and  the  morphological  closing  and  opening  are  defined  by, 
cpB(X)  =  ev  5B(A)  yB  =  eB VO  ■ 

Definition  1..  A  Granulometry  is  a  family  'ft,  for  t>0,  such  that 
T't  is  antiextensive,  increasing  for  all  t  and  for  all  s,t>0, 

W.W)  =  TsCFt(A0)  =  ^sup(s.t)W 

The  opening  yw  ,  witli  B  a  compact  convex  set  satisfies  these 
axioms  and  two  functions  are  associated;  the  probability 
distribution  function  and  its  derivate: 

F(KX)  =  ^V-v.(r,B(X))  g(KX)  =  jLF(KX)  (1) 

where  p  is  the  Lebesgue  measure  (area  in  this  case). 

For  the  X,  parameter  we  associate  the  critical  element  X=Xn  for  a 
given  set  X.  Xn=sup{X:  Jxb(X)  *=  0} .  hi  the  same  way  for  the 
erosion  case,Xn=sup{X:  zxb(X)  *  0} . 


III.-  Granulometry  of  Critical  Elements. 

Let  be  ry(X,X)  =ry(X)  =  X- yv&(X)  the  residue  operator  of  X 
after  application  of  y>.B  and  Xn  the  critical  element  of  X. 
Invariably  we  use  y^B  -  Y^and  ry(Xn4-i)  -X  for  Xn+\>Xn.  hi  a 
recursive  way,  we  have  ry(Xn)  =ry(Xn+i)  -  yxB(ry(Xn+i))  and 
for  a  given  k,  ry(Xk, ry(Xk+i))  =ry(Xk+i) -y*k(ry(Xk+i)), 
where  Xk  is  tlie  critical  element  of  ry(Xk+i).  hi  other  words, 

X{  =  sup{X  :  y^B(rY(A.i+] ))  *  0} 

We  associate  two  functions;  the  probability  distribution  function 
of  critical  elements  and  its  derivate: 

pffl  -  S"=k  Kyw(ry(Xi+i ))) 


Fc().k,X)  = 


(2) 


We  define  Fc(X, X)  ~  Fc(k\ C,W),  VX.  e  [Xk,  Xk+i )  and 
gc(X,X)  =  d(Fc(X,X))/dX  .  Using  a  linear  structuring  element, 
we  have  g(X,X)=gc(X,X)  and  F(X,X)=Fc(X,X). 


To  test  this  approach,  we  realize  a  random  geometrical 
characterization  by  using  a  deterministic  approach.  In  [2]  it  is 
showed,  that  the  deterministic  Sierpinski  Gasket  objet  S.G.  has  a 
similar  behavior,  in  tlie  percolation  studies,  than  the  random  S.G. 
We  use  this  physical  assumption.  In  fact,  tlie  Fc  and  gc  functions 
calculated  on  tlie  complement  of  deterministic  S.G.  are  similar 
than  the  random  case.  In  this  case  we  have 
FcriQ ik)  =  1  -  (E"=k  Ky,, (ry(A1+t  )))!\x(M))  =  ((4  -  P)/ 4)k  (3) 
where  M  is  tlie  mask  or  tlie  frame  and  P  is  tlie  probability  filling 
to  create  a  random  S.G.  From  (3)  we  obtain  a  family  of  straight 
lines  with  slope  log((4-P)/4), 
log  (FcriQi  k))  =  k  log  ((4  -  P)/ 4) 

By  calculating  tlie  slope  we  estimate  the  filling  factor  and  the 
fractal  dimension.  We  realized  experiments  to  estimate  the  fractal 
dimension  of  tlie  union  of  two  S.G.  objet  with  the  same  fractal 
dimension.  We  obtain  the  same  fractal  dimension.This  approach  is 
now  used  on  other  fractal  objets. 

IV.-  Dead  leaves  model 

An  appropriate  model  for  grains  overlaps  when  tlie  contour  of  the 
grains  is  apparent,  is  tlie  Dead  Leaves  Model  [3]  .  A  Dead  Leaves 
simulation  X,  is  constructed  by  implanting  independent 
realizations  of  primary  grains  X’  at  random  poins  of  a  Poisson 
point  process  (density  9 )  using  a  masking  law.The  probability  for 
a  connected  set  B  to  be  included  in  a  grain  is  given  by, 

P(B,  t )  =  P(B  c  X(t))  =  \HX'  ©  B)lyi(X'  ®£)[1  -  Q(B,  /)] 
where  Q(B,  t)  =  exp  (-0/p(A7  ®  11)) 

Let  XV  a  random  disk  (radius  R.  )  with  t]  unknow  frecuency 
(dicrete  case)  for  "n  classes"  and  B(r)  a  ball  of  radius  r.  Then, 

H(r)  =  k  ZRi>r  fi(Ri  -  r)2 
where 

LOG(Q(Bm  F\B{x\t) 

0[1  -Q(i?(r))] 

H(r)  can  be  estimated  from  tlie  images.  Initially,  we  estimate  tlie 
value  fn  by  calculating  tlie  size  V  (R^  -r=0)of  tlie  extreme 
element  of  the  class  n-1  (primary  grain).  Next,  a  similar  procedure 
is  used  to  estimate  f^,  by  calculating  the  extreme  element  of  the 
classe  n-2.  We  realize  the  same  operation  until  all  the  are 
estimated.  Tlie  number  of  classes  (limits  of  application)  is  four  or 
five. 
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Abstract  -----  In  this  communication,  we  discuss  a 
noise  tolerant  traffic  sign  recognition  algorithm  which 
can  be  utilized  in  future  vehicles  to  warn  drivers  that 
they  are  approaching  traffic  signs  at  an  intersection. 
It  is  assumed  that  the  vehicle  is  equipped  with  a 
video  camera  providing  chromatic  images  of  the  nav¬ 
igational  environment.  The  primary  objective  of  this 
paper  is  to  develop  a  noise  tolerant  color  segmenta¬ 
tion  algorithm  and  a  recognition  algorithm  which  is 
tolerant  to  the  rotation,  position,  and  scale  variations. 
The  treatment  of  traffic  signs  by  computer  vision  tech¬ 
niques  has  been  fairly  limited  in  the  literature. 

I.  Introduction 

In  outdoor  noisy  environments,  it  is  not  easy  to  obtain  invari¬ 
ant  feature  vectors  from  images  which  have  ratation,  position, 
or  scale  variations.  The  design  of  a  pattern  recognition  sys¬ 
tem  for  distorted  images  has  long  been  a  challenging  goal.  In 
classical  pattern  recognition  methods,  the  input  patterns  are 
required  to  be  standard  patterns,  since  they  are  very  sensitive 
to  rotations,  positions,  scale  variations,  or  noise.  Classical 
pattern  recognitin  systems,  for  example,  matched  filter,  do 
not  operate  well  under  these  distortions.  Distortions  such  as 
rotations,  positions,  and  scale  changes  of  the  pattern  can  be 
tolerated  by  using  proper  geometrical  transformations.  In  a 
real  outdoor  environment,  the  brightness  [1]  in  images  con¬ 
stantly  varies  due  to  sun  angle,  weather,  clouds,  or  other  con¬ 
ditions.  This  means  the  value  of  brightness  is  very  sensitive 
to  the  light  source.  In  such  a  case,  we  need  a  pattern  recog¬ 
nition  algorithm  which  is  relatively  insensitive  to  brightness 
variation. 

II.  System  Procedure 

The  object  recognition  algorithm  consists  of  two  phases:  noise 
tolerant  segmentation  and  object  classification  invariant  to  ro¬ 
tation,  position,  and  scale  variations.  The  results  of  color  seg¬ 
mentation  depend  not  only  on  its  segmentation  algorithm,  but 
also  on  choosing  the  color  coordinate  system.  In  this  study, 
the  proposed  (u,v,huc)  coordinate  system  is  relatively  insen¬ 
sitive  to  brightness  variation,  which  is  useful  to  measure  of 
color  difference  between  any  two  arbitrary  colors.  The  pro¬ 
posed  segmentation  algorithm  uses  a  split  and  merge  concept 
and  an  iteration  method.  Fig.  1  shows  the  procedure  of  the 
noise  tolerant  segmentation  algorithm. 

To  obtain  the  above  invariances,  PLFT(Polar- Log- Fourier 
transform)  is  used,  which  is  a  powerful  method  to  implement 
rotation  and  scale  invaiant  mapping  for  2-dimensional  object 
recognition.  Before  applying  PLFT,  we  have  to  do  position 
normalization  by  moving  the  object  to  the  center  of  the  image. 
The  network  for  object  classification  is  a  back-propagation 
network  with  forty-nine  input  nodes,  one  hundred  hidden  layer 
nodes,  and  four  output  nodes. 


III.  Experimental  Results 

Several  images  were  selected  to  demonstrate  the  robustness 
of  this  approach.  Namely:  do-not-enter,  stop,  yield,  and 
other  signs  were  processed  under  different  scales,  rotation, 
and  shape  variations.  The  input  vector  to  the  classification 
network  was  the  7x7  array  and  the  output  vector  had  four 
components  corresponding  to  the  traffic  signs. 

IV.  Conclusions 

A  pattern  recognition  algorithm  which  is  invariant  to  noise, 
brightness  variation,  rotation,  position,  and  scale  change  us¬ 
ing  color  classification  technique,  geometrical  transformation, 
and  an  artificial  neural  network  has  been  developed  as  part 
of  a  warning  system  for  approaching  traffic  signs.  The  algo¬ 
rithm  was  tested  on  a  large  number  of  signs  with  different 
positions,  rotations,  scales,  and  backgrounds.  The  results  of 
color  segmentation  phase  were  tolerant  to  noise  and  brightness 
variations. 


Figure  1:  Procedure  of  the  noise  tolerant  segmentation 
algorithm. 
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Summary  From  the  description  of  the  algorithm  in  the  above 

section,  it  can  be  seen  that  the  two  algorithms  use  the  same 


Motion  estimation(ME)  is  a  key  technique  in  interframe 
coding.  It  is  the  basis  of  most  compression  algorithms  for 
video  compression,  such  as  the  CCITT  standard  H.  261, 
MPEG  2  and  so  on[lj.  The  performance  of  ME  is  decided 
by  two  factors:  (1)  the  estimation  exactitude;  (2)  the 
computational  load. 

Full  search  algorithm  is  the  optimal  one  in  the  first 
meaning,  but  it  requires  extensive  computations.  To  reduce 
the  computational  complexity,  many  efficient  search 
algorithms  have  been  proposed[3-6].  As  described  in  [6], 
one  step  at  a  time  search(OSATS)  is  the  second  most 
efficient  algorithm,  but  it  becomes  inefficient  when  the 
search  window  is  greater  than  4  pels/frame.  The  aim  of  this 
paper  is  to  overcome  this  disadvantage. 

The  OSATS  algorithm[4]  looks  for  a  minimum  mean 
absolute  error  position(MMAEP)  in  the  i-direction  first,  and 
from  there  proceed  in  the  j-direction  to  find  the  final 
MMAEP  in  the  searching  window. 

On  basis  of  the  OSATS  algorithm,  VSS  makes  use  of 
variable  steps  during  search,  not  like  in  OSATS  where  one 
step  at  a  time  search.  Then  the  search  efficient  is  greatly 
advanced.  The  algorithm  is  described  as  follows(Fig.  1). 

Step  1:  Compare  the  current  block  with  the  block(i,  j)  in  the 
previous  frame,  if  the  value  D(i,  j)  of  the  distortion 
function(in  simulations,  mean  absolute  error(MAE)  is  used) 
is  less  than  a  predefined  threshold,  then  the  current  block  is 
thought  to  be  a  nonmoving  block  and  search  stops. 
Otherwise,  go  to  next  step. 

Step  2:  Compute  D(i,  j),  D(i,  j-1)  and  D(i,  j+1),  a  minimum 
is  got.  If  minimum  =  D(i,  j-1),  the  block  moves  left;  If 
minimum  =  D(i,  j+1),  the  block  moves  right;  Otherwise  it 
goes  vertically.  Set  the  search  step  size  "p"  equal  to  half  of 
the  search  window  "w",  i.e.  p  —  w  /  2  . 

Step  3:  Move  the  coordinate  (i,  j)  to  MMAEP(m,  n),  i.e.  i  = 
m  and  j  =  n.  Find  MMAEP(m,  n)  of  the  coordinates  (i,  j), 
0>  j+sp),  where  "s"  is  a  sign  function  which  is  equal  to  "1"  if 
the  block  moves  right  or  "-1"  if  the  block  moves  left. 

Step  4:  If  p  =  1,  go  to  Step  5,  otherwise  halve  the  step  size 
"p"  and  go  to  Step  3. 

Step  5:  Keep  j-direction  fixed  after  finding  MMAEP  in  j- 
direction,  proceed  in  i-direction  as  that  of  j-direction. 

Therefore,  for  the  maximum  motion  displacement  of  w 
pels/frame,  the  total  number  of  computations  is 
5  +  2  logjW .  A  simple  example  is  given  in  Fig.  1,  where 
w  =  4. 


idea  that  is  first  to  find  MMAEP  in  i-direction  and  then 
keep  j-direction  fixed,  find  the  position  in  j-direction  with 
minimum  MAE,  which  is  also  the  final  MMAEP  in  the 
searching  window.  Thus,  the  result  of  motion  estimation  in 
VSS  is  the  same  as  that  of  the  OSATS.  However,  the 
maximum  number  of  search  points(MNSP)  in  VSS  is  much 
less  than  that  in  OSATS,  where  MNSP  is  3  +  2  w . 


•  :  MMAEP  in  searching  window. 


Fig.  1  Variable  Step  Search 
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Abstract  —  In  the  most  general  form  the  defocus- 
ing  operator  is  a  linear,  circularly  symmetric,  lowpass, 
and  space  variant  filter.  In  this  paper  without  any  re¬ 
striction  on  the  general  model,  the  filter’s  band-width 
will  be  used  as  a  measure  of  depth.  The  paper  intro¬ 
duces  a  simple  and  efficient  way  to  obtain  this  measure 
which  has  a  well  founded  mathematical  tractability. 
Experimental  results  indicates  its  high  capability  to 
resolve  depth. 

I.  Introduction 

Depth  From  Defocus  methods  are  based  on  the  relationship 
between  depth  and  the  amount  of  defocus  at  each  image  point 
[1-3].  For  a  more  effective  use  of  this  idea,  the  most  general 
form  of  the  defocusing  operator  should  be  used.  The  main 
idea  in  the  proposed  method  is  using  the  filter’s  bandwidth 
at  each  point  as  a  value  related  to  depth.  For  well-behaved 
low  pass  filters  second  derivative  of  the  frequency  response  at 
origin,  is  a  good  measure  of  their  effective  band- width.  This 
measure  is  used  here  as  a  sense  for  depth.  In  the  next  section 
theoretical  foundations  of  the  method  is  explained.  An  exper¬ 
imental  result,  is  given  in  section  III.  Section  IV  concludes  this 
paper. 

II.  Theoretical  Foundations 

To  extract  the  second  derivative  of  the  frequency  response 
of  the  defocusing  filter  at  each  image  point,  the  defocusing 
process  is  analysed  in  small  regions  of  the  image  on  which,  the 
filter  can  be  assumed  space  invariant.  Consider  the  following 
functions  in  any  small  region  (radius  Vm)  and  in  the  polar 
cordinates  (r,  8): 

i0(r,  6)  :  focused  image 

ii(r,0)  :  blurred  or  defocused  image 

h(r,9)=h(r)  :  defocusing  operator, 

Computing  ?'0(r)  and  i;(r)  by  averaging  the  first  two  functions 
with  respect  to  6  (from  0  to  2n)  it  can  be  shown  [4]  that  the 
parameter  di,  which  is  defined  as 


can  be  used  instead  of  the  second  derivative  of  the  Fourier 
Transform  of  zt-(r)  at  the  origion.  Constructing  similar  pa¬ 
rameters  d0  and  dh  for  ia  and  in  respectively,  it  can  also  be 
shown  [4]  that  the  parameters  dt ,  d0,  and  dh  are  related  by 

dt  —  d0  +  dh  (2) 

and  they  can  also  be  interpreted  as  powers  of  signals  having 
normalized  ri,-(r),  ri0(r),  and  rh(r)  as  their  density  functions 
in  [0,  rro].  This  inteperetation  and  the  additive  form  of  (2) 
represents  the  mathematical  tractability  of  di.  In  other  words 
dt  can  be  used  as  a  sense  of  depth  in  all  regions  of  the  image 


with  the  same  d0.  For  instance  dt  can  be  used  as  a  measure 
of  dh  or  depth  in  all  regions  of  the  image  having  the  same 
texture. 

III.  Experimental  Results 

As  an  example  for  evaluating  the  performance  of  the  pro¬ 
posed  sense,  the  edge  texture  is  selected  and  used  for  comput¬ 
ing  di. The  image  of  a  black  stripe  of  approximately  400  pixels 
long  and  50  pixels  wide  on  a  white  flat  background,  tilted 
against  the  camera,  is  used  in  a  noisy  environment.  The  sim¬ 
ple  experimental  set  up  is  described  in  [4].  In  figure  1  the 
normalized  dt  along  one  of  the  located  edges  in  the  image  is 
plotted  as  a  function  of  the  length  of  the  stripe.  Due  to  the 
configuration  of  the  set  up,  ideal  curve  should  have  a  mono- 
tonically  increasing  form.  Thus  the  ability  of  the  proposed 
sense  of  depth,  or  dt ,  can  be  seen  from  this  figure. 

IV.  Conclusions 

In  this  paper,  based  on  the  most  general  form,  the  param¬ 
eter  di  is  introduced  for  sensing  depth.  In  the  regions  of  the 
image  with  the  same  texture,  this  parameter  is  a  good  mea¬ 
sure  of  the  second  derivative  of  the  defocusing  filter  or  depth. 
Using  (1),  there  is  no  need  for  going  to  Fourier  domain  and 
differentiating  which  is  sensetive  to  measurment  noise.  Ex¬ 
perimental  results  on  the  edge  texture  indicates  its  ability  to 
resolve  depth. 
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Abstract  —  The  problem  of  fixed-rate  block  quantiz¬ 
ation  of  an  unbounded  real  memoryless  source  is  stud¬ 
ied.  It  is  proved  that  if  the  source  has  a  finite  sixth 
moment,  then  there  exists  a  sequence  of  quantizers 
Qn  of  increasing  dimension  n  and  fixed  rate  R  such 
that  the  mean  squared  distortion  A(Qn)  is  bounded 
as  A (Qn)  =  D(R)  -f  0(y/\og  n/n),  where  D(R)  is  the 
distortion-rate  function  of  the  source.  Applications  of 
this  result  include  the  evaluation  of  the  distortion  re¬ 
dundancy  of  fixed-rate  universal  quantizers,  and  the 
generalization  to  the  non-Gaussian  case  of  a  result  of 
Wyner  on  the  transmission  of  a  quantized  Gaussian 
source  over  a  memoryless  channel. 

Shannon’s  source  coding  theorem  with  a  fidelity  criterion  [1] 
showed  that  by  increasing  the  blocklength  n  of  a  lossy  source 
code,  it  is  possible  to  have  the  mean  squared  error  approach  the 
distortion- rate  lower  bound  arbitrarily  closely.  Pile  [3]  showed 
that  for  finite  alphabet  sources  the  convergence  of  the  mean 
squared  error  to  the  distortion-rate  function  occurs  at  a  rate 
0(logn/n).  It  has  recently  been  shown  [2]  that  for  bounded 
real  memoryless  sources  and  squared  distortion  this  conver¬ 
gence  occurs  at  a  rate  0(  x/log  njn)  This  result  was  used  in 

[2]  to  analyze  the  performance  of  a  certain  universal  quantiz¬ 
ation  scheme.  On  the  other  hand,  the  assumption  of  bounded 
support  is  sometimes  a  severe  restriction  in  signal  quantiza¬ 
tion,  especially  since  some  of  the  most  popular  source  models 
have  unbounded  support,  such  as  the  Laplacian.  The  conver¬ 
gence  rate  results  mentioned  above  also  assume  that  binary 
information  is  transmitted  across  a  lossless  channel.  In  the 
present  paper  we  eliminate  the  bounded  support  requirement 
and  also  consider  transmission  across  a  noisy  channel.  In  ad¬ 
dition  we  are  able  to  obtain  a  rate  of  convergence  result  for 
universal  lossy  source  coding. 

Theorem  1  Let  Xi,X2y...  be  a  real  i.i.d.  source  with 
E|Xi|2  =  M2  and  E|Xi|6  <  00.  Let  0  <  Ri  <  R2  and  assume 
that  D(R2)  >  0.  Then  for  any  R  £  [R\,R2]  there  exists  an 
n-dimensional  quantizer  Qn  with  rate  r(Qn)  <  R  such  that 


A(Q„)<D(i?)  +  £y^p, 

for  all  n  >  1,  where  the  constant  B  depends  only  on  Ri,  R2} 
and  the  source  distribution .  Furthermore ,  the  quantizers  sat¬ 
isfy 

max  -||Qn(x)||2  <  2 M2. 
xeMn  n 
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In  [4]  Wyner  proved  that  Dn(R)  —  D(R)  =  0(\ogn/n) 
for  memoryless  Gaussian  sources,  and  in  [5]  showed  that 
Dn(R)  —  D(R)  —  0(ydog  n/n)  for  any  correlated  Gaussian 
source  with  a  sufficiently  well-behaved  spectral  density.  Re¬ 
cently  Zamir  and  Feder  [6]  showed  that  a  0(logn/n)  con¬ 
vergence  rate  is  achievable  for  correlated  Gaussian  sources 
by  means  of  a  variable  rate  coding  scheme  using  subtractive 
dither. 

Corollary  1  Suppose  we  are  given  a  real  memoryless  source 
Xi,X2l...  with  distortion-rate  function  D(R),  satisfying 
E|  AT|6  <  oo,  and  a  discrete  memoryless  channel  of  capacity 
C,  accepting  one  input  per  source  output.  Then  there  exists  a 
source- channel  coding  scheme  with  delay  n,  such  that  denoting 
by  X\ ,  X2 , . . . ,  Xn  the  channel  decoder  output ,  we  have 

-  *,f )  <  0(C) + o  ) . 

Corollary  2  For  any  R  >  0,  k  >  8;  and  e  >  2/(k  —  4)  there 
exists  a  sequence  of  universal  quantizers  {Qn}  such  that 

and  for  any  memoryless  real  source  with  E|Xi|*  <  00 
a  (Qn)  -  D(R)  =  O  (pf)(1/2)~£)  . 
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Abstract  -  Let  {A,}  ~  Pe,  0  €  A  C  ]Rfc.  Rissanen 
has  shown  that  there  exist  universal  noiseless  codes 
for  {A',}  with  per-letter  rate  redundancy  as  low  as 
h.  *°5 n  n  where  n  is  the  blocklength  and  k  is  the  num¬ 
ber  of  source  parameters.  We  derive  an  analogous 
result  for  universal  quantization:  for  any  given  La¬ 
grange  multiplier  A  >  0,  there  exist  universal  fixed- 
rate  and  variable-rate  quantizers  with  per-letter  La- 
grangian  redundancy  (i.e.9  distortion  redundancy  plus 
A  times  the  rate  redundancy)  as  low  as  A  ■ 


Let  {Xt}  be  a  stationary  ergodic  random  process  over  al¬ 
phabet  X  with  process  measure  Pe,  9  €  A  C  IRfc,  and  let  C  = 
8n  o  on  be  a  length-72  quantizer  with  encoder  an  :  X 71  — ►  S 
and  decoder  /?n  :  S  yn ,  where  <5  =  {si, . . . ,  sm}  Q  {0, 1} 
is  some  binary  prefix  code  and  y  is  the  reproduction  alphabet. 
Let  d(xn,  ynj =  Y,r  d(xi’V')  be  a  single-letter  fidelity  criterion 
and  let  \s\  denote  the  length  of  the  binary  string  5.  The  nth 
order  operational  distortion-rate  function  for  {AL}  is  defined 

Dne(R)  =  inf  {  l-Eed{Xn,  Cn{ A")) :  \e9 |aB(A")|  <  r}  , 
cn  In  71  ' 


where  the  infimum  is  over  either  fixed-rate  or  variable-rate 
quantizers  with  blocklength  n,  as  appropriate.  The  support 
functional  of  Dg(R)  is  defined 

£g(A)  =  inf  \-Eed(X\Cn{Xn))  +  X-Ee\an(Xn)\}  , 
cn  in  n  j 


where  A  >  0  is  a  Lagrange  multiplier. 

We  show  that  there  exists  a  universal  sequence  of  fixed- 
rate  or  variable-rate  quantizers  { Cn }  such  that  the  per-letter 
Lagrangian 

4(A,  Cn)  =  - Eed(Xn,Cn(Xn ))  +  \±Ee\an(Xn)\ 
n  n 

converges  to  the  support  functional  L$( A)  as  A^  ^  "  f°r  every 
0  £  yV  C  IR*.  To  be  precise,  assume  that  for  every  9 ,  A,  and 
n,  Lq( A)  is  achieved  by  some  Cn,  say  C£x-  Then  define 


0  e  Se,x  and  for  all  n.  Then  for  each  A  there  exists  a  weakly 
minimax  universal  sequence  of  codes  {Cn}  such  that  for  all  0 


If  A  is  bounded,  and  Se,\  and  me,\  do  not  depend  on  6,  then 
neither  does  ce, a,  and  the  sequence  {Cn}  is  strongly  minimax 
universal. 


Proof.  Fix  A.  Construct  Cn  =  o  an  as  follows.  For 
each  n  >  1,  partition  B.k  into  a  grid  of  hypercubes  {A”  :  i  = 
1,2,...}  each  with  side  1  /[n1^2],  such  that  {A?  :  i  =  1,2,...} 
refines  {A]  :  j  =  1,2, . . .}.  For  each  hypercube  A?  that  inter¬ 
sects  A,  choose  a  representative  0"  €  Af  n  A  and  its  match¬ 
ing  quantizer  C"  =  ^  .  Then  define  the  encoder  a  to 

map  xn  to  the  string  s  =  s'"  where  represents  the  unit 

hypercube  A )  containing  A”,  (which  can  be  a  fixed-length 
string  if  A  is  bounded),  s"  represents  the  hypercube  A"  in¬ 
dexed  within  A)  (which  is  a  fixed-length  string  with  length 
loglV/2!*),  and  s'”  is  the  string  a”(xn)  representing  xn  us¬ 
ing  the  quantizer  Cf .  The  decoder  maps  s  to  the  reproduction 
yn  =  Pf(s"').  The  index  i  is  chosen  to  minimize  the  instan¬ 
taneous  Lagrangian  d(xn,  C?(xn))  +  A|s's"s"'|.  Thus 

d{xn,Cn(xn))  +  A|«"(xn)|  =  min  d(xn,  C"(xn))  +  A|sjs”a"(x 

<  d(xn,  C”(xn))  +  A|sj  s”a" (*")| 

for  any  particular  j.  Let  j  be  the  index  of  the  cell  A}  con¬ 
taining  0.  Then  dividing  by  n,  taking  expectations,  and  sub¬ 
tracting  Lg( A),  we  have 

le{\,Cn)  -  Lne{\)  <  MA,C;)-i?(A)  +  ^|«X'l 

<  A"(e||^n)  +  ^  (&e  +  |logn)  , 

for  some  constant  6$.  By  assumption,  A”(0||0)  <^me,x\\9-0\\2 
for  all  9  in  a  neighborhood  Se,x  of  0.  Since  0”  — ►  9  with 
||0  -  0J||2  <  k/n,  there  exists  a  constant  a$,x  such  that 
A£(0||0" )  <  ae,xk/n  for  all  n.  Thus  the  theorem  is  proved 
with  ce,x  =  2a$,x/X  +  2 b$/k.  □ 


to  be  the  divergence  between  the  Lagrangian  performance  of 
the  quantizer  matched  to  9  and  the  quantizer  matched  to  0, 
with  respect  to  9.  We  have  the  following: 

Theorem  1  Let  A  be  a  subset  oflRk  (bounded  if  we  are  con¬ 
sidering  fixed-rate  coding  but  possibly  unbounded  otherwise) . 
Suppose  that  for  each  9 ,  X,  and  n  there  exists  a  code  C£x 
achieving  the  support  functional  Le(X).  Suppose  also  that 
the  corresponding  divergence  A"(0||0)  is  locally  quadratic  such 
that  for  each  9  and  A  there  exists  a  neighborhood  Se,x  of  9  and 
a  constant  me, x  such  that  A™(0||0)  <  me,x\\9  —  0||  f°r 


A  simple  example  of  a  source  satisfying  the  conditions 
of  the  theorem  is  the  following.  Let  Zi,Z2,---  be  an  ar¬ 
bitrary  real-valued  stationary  ergodic  process  with  mean  0 
and  variance  1,  and  let  Xt  =  <rZt  -j-  p.  Then  with  9  ~ 
(H,<r)  e  A  C  IR2,  under  the  squared-error  distortion  mea¬ 
sure  and  fixed-rate  quantization  of  {AA},  for  all  A,  n,  0,  and 
0,  A”(0||0)  <  ||0  -  9\\2.  Hence  for  any  stationary  source  with 
unknown  mean  and  variance  in  a  bounded  set,  there  exists  a 
strongly  minimax  universal  sequence  of  fixed-rate  quantizers 
for  which  the  nth  order  Lagrangian  redundancy  is  at  most 
A(fc/2)(log  n  +  c)/n,  where  k  =  2. 
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Abstract  -  It  is  shown  that  as  rate  increases  the 
problem  of  asymptotically  optimal  scalar  quantization  has 
polynomial-time  (or  space)  encoding  complexity  if  the 
distribution  function  corresponding  to  the  one-third  power 
of  the  source  density  is  polynomial-time  (or  space) 
computable  in  the  Turing  sense. 

I.  INTRODUCTION 

Shannon’s  distortion-rate  theory  describes  the  optimal 
tradeoff  between  rate  and  distortion  of  vector  quantizers. 
While  it  does  not  address  the  question  of  complexity,  gener¬ 
ally  speaking,  it  is  evident  that  quantizers  need  to  become 
very  complex  in  order  to  approach  the  optimal  performance 
tradeoff,  namely,  the  distortion-rate  function.  It  is  well 
known  that  the  full-search  unstructured  quantizers  with  dimen¬ 
sion  k  and  rate  R  has  storage  and  arithmetic  complexity  in¬ 
creasing  exponentially  with  the  dimension  rate  product  kR. 
However,  there  are  many  reduced  complexity  full-search  meth¬ 
ods,  and  the  question  of  how  fast  complexity  must  increase  as 
performance  approaches  the  rate-distortion  function  is  open. 
Moreover,  there  are  many  structured  vector  quantization  tech¬ 
niques  whose  complexities  are  substantially  less  than  that  of 
full  search,  but  whose  performance  does  not  approach  the  dis¬ 
tortion-rate  function.  It  is  unclear  whether  there  exist  struc¬ 
tured  quantizers  with  significantly  reduced  complexity  and 
distortion  close  to  the  optimal. 

The  approach  taken  in  this  paper  is  to  consider  how  the 
complexity  of  (asymptotically)  optimal  quantization  with  a 
given  dimension  k  increases  with  rate  R.  Specifically,  as  an 
initial  effort,  we  focus  on  the  encoding  complexity  of  scalar 
quantization. 


II.  PROBLEM  FORMULATION 

In  stating  and  deriving  the  main  result  we  adopt  a  Turing- 
like  framework  for  evaluating  complexity.  Instead  of 
assuming  a  different  encoding  machine  for  each  R,  whose 
relative  complexities  would  be  difficult,  if  not  impossible  to 
assess,  we  envision  one  machine,  namely  an  oracle  Turing 
machine  M,  c.f.  [1,2],  that  is  capable  of  encoding  at  any 
integer  rate.  That  is,  when  rate  R  is  specified,  its  output  in 
response  to  a  source  sample  x  is  an  index  /,  l</<2  .  We 
let  d(M,p,R)  denote  the  mean-squared  error  (MSE)  that  results 
when  this  Turing  encoder  is  used  with  an  optimum  decoder. 

In  the  context  of  encoding,  an  oracle  Turing  machine 
consists  of  a  finite-state  machine,  an  unlimited  tape  memory 
and  an  oracle  that  provides  a  dyadic  approximation  to  the 
source  sample  x  to  the  required  precision.  The  time  (space) 
complexity  of  encoding  at  rate  R  with  this  machine,  denoted 
c(M\R ),  is  the  maximum  number  of  steps,  (alternatively,  the 
maximum  amount  of  tape  memory)  required  to  encode  an 
arbitrary  input  sample. 

We  say  a  source  density  p  is  asymptotically  optimally 
quantizable  in  polynomial  time  (or  space),  abbreviated 
PTIME-AOQ  (or  PSP ACE-AOQ ),  if  there  exists  a  Turing 
encoder  M  and  a  polynomial  g  such  that 

c(M,R)<  g(R)  V/?eZ+,  and  as/?->oo. 

,  D  ( P,R ) 

where  D  ( p,R )  is  the  mean-squared  error  of  the  optimum 
quantizer  of  rate  R . 


Intuitively,  it  is  easy  to  see  that  some  source  densities  are 
intrinsically  easier  to  quantize  than  others.  For  instance, 
sources  with  uniform  density  can  be  optimally  quantized  by 
simple  uniform  quantizers.  On  the  other  hand,  it  is  also 
known  that  the  optimal  quantization  point  density  for  a  given 
source  is  directly  related  to  the  one-third  power  of  the  source 
density.  Therefore,  it  seems  reasonable  that  the  possibility  of 
optimal  quantization  with  polynomial  complexity  should 
depend  on  the  “complexity”  of  the  desired  point  density.  In 
order  to  rigorously  analyze  this  relationship,  we  adopt  the 
framework  of  Turing  complexity  for  real-valued  functions,  c.f. 
[2].  In  this  theory,  a  real-valued  function  91  is  said  to 

be  polynmial-time  (space)  computable  if  there  is  an  oracle 
Turing  machine  M  that  is  capabable  of  providing,  for  any  x, 
a  dyadic  approximation  to  /(x)  to  within  an  error  of  2~n  for 
any  pre-specified  integer  n ,  and  its  time  (space)  complexity  is 
bounded  from  above  by  a  polynomial  function  of  n. 

We  are  now  ready  to  present  the  main  results  of  this  paper. 


III.  RESULTS 


Proposition  1:  Suppose  A(x)  is  a  desired  quantization 

point  density  such  that  J p(x)  /  A(x)2  dx  <  oo  and  the  function 
^(x)  =  J  A(y)<fy  is  polynomial-time  (alternatively,  space) 

computable.  Then  there  exists  a  Turing  encoder  that  runs  in 
polynomial-time  (space)  and  with  the  resulting  MSE  satis¬ 
fying 


D(X,p,R) 


R—>Cx 


where  D(X,p,R)  =  {l  2R  /  12)J/>(y)/  X(y)2dy  is  the  Bennett 

integral  prediction  for  the  MSE  of  a  quantizer  with  a  given 
point  density. 


Corollary  2:  If  the  source  density  p  is  such  that  the 

function  F(x)  =  jfcp(y)mdy,  where  c  =  (||p||1/3]“1/3,  is 

polynomial-time  (space)  computable,  then  p  is  PTIME-AOQ 
(PSP  ACE-AOQ). 

By  applying  Corollary  2,  one  can  easily  show  that 
Gaussian,  Laplacian  and  uniform  source  densities  with  zero 
means  and  unit  variances  are  PTIME-AOQ  and  PSPACE-AOQ. 
On  the  other  hand,  it  is  also  possible  to  construct  a  source 
density  p  for  which  the  function  F  in  Corollary  2  is  not 
computable  in  polynomial  time. 
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I.  Introduction 

In  this  paper  we  study  the  waveform  coding  problem  where 
the  data  source  symbols  have  a  distribution  that  is  simultane¬ 
ously  highly  peaked  and  very  long  tailed — a  situation  when  the 
source  entropy  is  small,  but  the  coding  process  must  deal  with 
a  very  large  number  of  symbols.  This  type  of  problem  can  be 
found,  for  example,  in  the  lossless  compression  of  medical  im¬ 
ages.  Those  images  are  digitized  with  10—12  bpp,  and  they  are 
commonly  quite  smooth.  With  the  adequate  reversible  trans¬ 
formation  (e.g.,  linear  prediction)  we  have  a  large  fraction  of 
pixel  values  near  zero,  but  a  significant  number  of  pixels  have 
very  large  magnitudes. 

There  are  many  practical  difficulties  when  the  data  alpha¬ 
bet  is  large,  which  get  much  worse  if  we  try  to  exploit  the 
statistical  dependence  left  between  the  source  samples  by,  for 
example,  coding  several  symbols  together  or  designing  condi¬ 
tional  codes.  Several  ad  hoc  methods  have  been  devised  to  deal 
with  the  problems  caused  by  large  alphabets.  For  instance,  a 
popular  method  uses  the  “overflow”  symbol  to  indicate  which 
symbols  are  too  large  and  should  be  coded  separately. 

II.  The  Alphabet  Partitioning  Method 

We  study  a  method  to  reduce  the  coding  complexity  when 
the  source  alphabet  is  large,  based  on  the  following  coding 
strategy: 

•  the  source  alphabet  is  partitioned  in  a  relatively  small 
number  of  sets,  with  the  number  of  elements  in  a  set 
equal  to  a  power  of  two. 

•  each  symbol  is  coded  in  two  steps:  first  the  number  of 
the  set  in  which  the  symbol  belongs  (called  set  num¬ 
ber)  is  coded;  afterwards  the  number  of  that  particular 
source  symbol  inside  that  set  (the  set  index)  is  coded; 

•  when  coding  the  pair  (set  number,  set  index)  the  set 
number  is  entropy-coded  with  a  powerful  and  complex 
method,  while  the  set  index  is  left  uncoded,  i.e.,  its 
binary  representation  is  stored  or  transmitted. 

The  advantage  of  this  scheme  is  that  it  is  normally  possible 
to  find  partitions  that  simultaneously  allow  large  reductions 
in  the  coding  complexity  and  with  a  very  small  loss  in  the 
compression  ratios.  This  partitioning  technique  is  quite  sim¬ 
ilar  to  the  definition  of  “buckets”  used  in  [1]  for  complexity 
reduction.  The  set  numbers  correspond  to  the  bucket  num¬ 
ber,  and  can  also  be  used  to  simplify  context-based  coding. 

1This  work  was  supported  by  CNPq,  Conselho  Nacional  de  De- 
senvolvimento  Cientffico  e  Tecnologico,  Brazil. 


However,  here  they  have  an  additional  purposes:  they  allow 
part  of  the  information  to  left  uncoded,  with  obvious  advan¬ 
tages  in  speed  and  complexity.  Furthermore,  we  show  the 
advantages  with  methods  that  entropy-code  several  symbols 
together  (e.g.,  Huffman,  Lempel-Ziv). 

To  evaluate  the  loss  incurred  by  leaving  the  set  index  un¬ 
coded,  we  assume  a  source  with  M  symbols,  each  with  prob¬ 
ability  pi,  i  =  1  ,...,M.  The  source  entropy  is  denoted  by 
H.  Partitioning  the  source  symbols  in  nonempty  sets  Sn , 
n  —  1,  2, . . . ,  N,  we  denote  the  number  of  elements  in  Sn  by 
Mn  =  2Kn.  We  show  that  the  expression  for  the  maximum 
loss  due  to  leaving  the  set  index  uncoded  is 

(1) 

n= 1 ieSn 


where 


£ 


(2) 


i€Sn 

is  the  probability  that  the  symbol  belongs  to  the  set  with 
number  n. 

Equation  (1)  shows  that,  for  each  set,  the  loss  should  be 
small  under  two  circumstances: 


1.  Mnpi  «  Pn ,  that  is,  the  distribution  inside  the  set  is 
approximately  uniform; 

2.  the  contribution  of  the  set  n  to  the  entropy  is  very  small. 

Using  the  approach  summarized  above  we  consider  the  al¬ 
ternatives  to  find  the  best  partitions  for  a  given  source,  and 
analyze  the  trade-off  between  the  coding/decoding  complexity 
and  the  compression  efficiency.  Numerical  results  make  clear 
the  advantages  of  the  alphabet  partitioning  method  when  used 
for  the  lossless  compression  of  medical  images.  They  show  that 
there  are  simple  and  efficient  methods  to  define  the  partitions, 
and  those  are  quite  versatile,  i.e.,  they  can  be  efficiently  used 
for  several  images  of  the  same  type. 
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Abstract  —  We  present  some  results  on  quantiza¬ 
tion  of  a  narrowband  Gauss- Markov  process.  The 
narrowband  process  is  modeled  as  a  lowpass  complex 
envelope  with  a  state  space  description.  We  com¬ 
pare  the  performance  of  narrowband  process  quanti¬ 
zation  schemes  with  several  other  previously  analyzed 
schemes. 

I.  Introduction 

We  present  some  results  on  quantization  of  a  narrowband 
Gauss-Markov  process.  The  narrowband  process  is  modeled 
in  a  state  space  framework  and  several  schemes  for  tracking 
the  inphase  and  the  quadrature  components  of  the  narrow- 
band  process  are  considered.  These  inphase  and  quadrature 
components  are  baseband  processes,  and  thus  are  much  more 
slowly  varying  than  the  original  narrowband  process.  We  com¬ 
pare  the  performance  of  these  schemes  with  several  previously 
analyzed  schemes  [1,  2]  both  with  respect  to  a  time  aver¬ 
aged  smoothed  error,  and  their  robustness  with  respect  to  the 
changes  in  the  input  spectrum.  Finally,  we  present  an  analysis 
of  the  case  when  this  narrowband  process  is  input  to  a  sigma- 
delta  modulator.  By  performing  an  approximate  analysis,  we 
arrive  at  results  which  are  applicable  for  a  large  class  of  inputs 
and  are  consistent  with  other  more  rigorous  analyses  [3,  4]. 

II.  Modeling  of  Narrowband  Process 

If  x(t)  is  the  original  narrowband  process,  the  complex  enve¬ 
lope  is  given  by  :  x(<)  =  xc(<)  +  jx3(t),  where  xc(t)  and  x£(t) 
are  respectively  the  inphase  and  the  quadrature  components 
of  the  narrowband  process.  Once  we  obtain  the  inphase  and 
the  quadrature  components,  we  quantize  them  independently, 
or  together  by  considering  them  as  a  complex  state.  These 
quantized  values  are  then  used  to  find  an  estimate  of  the  orig¬ 
inal  narrowband  process.  We  consider  three  envelope  quanti¬ 
zation  schemes:  i)  scalar  quantization  of  the  inphase  and  the 
quadrature  components  (complex  components),  ii)  differential 
quantization  of  the  complex  components,  and  iii)  quantiza¬ 
tion  of  the  complex  state  S (t)  =  [zc(t)  x$(t)]T .  In  [l,  2]  we 
analyzed  several  source  coding  schemes  for  a  continuous  time 
Gauss-Markov  process.  By  fixing  the  overall  transmission  rate 
we  compared  the  smoothed  error  performance  of  these  source 
coding  systems.  By  performing  an  identical  analysis  for  the 
envelope  quantization  schemes,  we  can  evaluate  the  optimal 
tradeoff  between  sampling  interval  and  the  quantization  levels 
for  the  envelope  quantization  schemes. 

III.  Smoothed  Error  Performance 

We  compare  the  performance  of  the  envelope  quantization 
schemes  with  differential  state  quantization  for  a  second  order 
narrowband  process.  The  hierarchy  of  performance  within  the 
different  schemes  is  (from  best  to  worst)  :  i)  differential  state 
quantization,  ii)  differential  quantization  of  the  complex  state, 
iii)  differential  quantization  of  the  inphase  and  the  quadrature 
components  of  the  complex  envelope,  iv)  scalar  quantization 


of  the  inphase  and  the  quadrature  components,  and  v)  state 
vector  quantization.  Thus  the  envelope  quantization  schemes 
perform  better  than  state  vector  quantization  but  worse  than 
differential  state  quantization.  Differential  state  quantization 
considers  the  quantization  of  a  state  consisting  of  the  proces 
and  its  derivatives  and  was  shown  to  be  a  superior  quantiza¬ 
tion  scheme  for  Gauss-Markov  processes  [2]. 

IV.  Comparison  of  Robustness 

We  define  a  measure  of  robustness  for  different  source  coding 
systems,  when  the  input  spectrum  changes.  We  encounter 
many  situations  where  the  input  process  is  changing  at  regular 
intervals.  In  such  situations  the  performance  of  the  system 
which  is  designed  for  one  particular  input  process  deteriorates 
as  the  input  spectrum  changes  from  its  original  value.  We 
model  this  change  in  terms  of  the  state  space  matrices  A  and 
B  and  quantify  the  deterioration  which  accrues  due  to  the 
changes  in  the  spectrum.  We  show  that  for  a  second  order 
narrowband  process,  for  an  N  level  two  dimensional  vector 
quantizer,  the  normalized  change  in  the  smoothed  error  due 
to  the  changes  in  the  input  spectrum  for  both  the  schemes 
(differential  state  quantization  [2]  and  differential  quantization 
of  the  complex  state)  is  approximately  proportional  to  . 
The  changes  in  the  smoothed  error  are  marginally  less  in  the 
differential  quantization  of  the  complex  state. 

V.  Results  on  Sigma-Delta  Modulation 

Finally,  we  present  a  simple  analysis  of  the  case  when  the 
narrowband  process  is  input  to  a  sigma-delta  modulator.  We 
develop  an  approximate  theory  to  analyze  the  quantization 
noise  spectra  of  a  sigma-delta  modulator  when  the  input  is 
this  narrowband  Gauss-Markov  process  of  any  arbitrary  order. 
The  quantization  noise  spectra  can  always  be  approximated 
by  a  simple  closed  form  expression  in  terms  of  state  space 
matrices. 
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Abstract  —  In  this  paper,  a  statistical  estima¬ 
tion  framework  is  proposed  for  adaptive  quantization 
based  on  causal  past.  Different  estimation  methods 
are  given  for  the  marginal  density  based  on  the  quan¬ 
tized  sample.  For  a  stationary  and  ergodic  source  pro¬ 
cess,  if  its  marginal  density  is  in  a  parametric  family 
with  a  dimension  less  than  the  quantization  level,  then 
“adaptation”  can  be  achieved  when  the  sample  size  is 
large,  i.e.,  the  marginal  density  can  be  estimated  con¬ 
sistently. 

Summary 

Adaptive  lossless  encoding/decoding  based  on  the  causal  past 
is  equivalent  to  the  predictive  version  Rissanen’s  MDL  (1989) 
or  the  prequential  approach  to  statistical  inference  of  Dawid 
(1984).  In  other  words,  at  time  N  +  1,  a  lossless  code  can  be 
designed  based  on  the  causal  past  data  xN  =  (zi,  x2i xn) 
to  encode  the  next  data  point  zjv+i-  Since  the  causal  past 
data  is  available  to  both  the  encoding  and  decoding  ends  and 
as  long  as  both  ends  agree  to  the  same  lossless  coding  rule 
depending  on  the  causal  past  data,  the  encoding  and  decod¬ 
ing  can  be  done  ”on  the  fly”.  In  statistical  terms,  a  lossless 
code  based  on  the  causal  past  amounts  to  a  predictive  density 
for  zat+i  based  on  /.  Such  a  predictive  density  can  be  ob¬ 
tained  either  parametrically  or  non-parametrically.  The  para¬ 
metric  predictive  density  can  simply  be  the  plug-in  density 
estimator  /(-| On)  where  /(*|0)  is  a  pre-determined  paramet¬ 
ric  family  such  as  the  Gaussian  family  with  unknown  mean 
and  variance,  and  On  is  a  good  estimator  (e.g.  the  maximum 
likelihood  estimator)  of  0  based  on  xN .  On  the  other  hand, 
the  nonparametric  predictive  density  can  be  any  good  non- 
parametric  density  estimator  based  on  xN ,  for  example  the 
kernel  or  log-spline  density  estimators.  In  this  paper,  we  show 
that  a  parallel  estimation  theory  can  be  established  based  on 
quantized  or  lossy  data. 

Recently,  Ortega  and  Vetterli  (1994)  proposed  an  adaptive 
quantization  algorithm  based  on  the  causal  past  and  they  also 
presented  convincing  experimental  results.  Their  approach 
differs  from  other  adaptive  quantization  in  that  there  is  no  sep¬ 
arate  training  data  set  needed  for  the  quantization  -  the  quan¬ 
tizer  is  re-designed  sequentially  based  on  causal  past  quantized 
sample.  Following  Ortega  and  Vetterli  (1994),  we  divide  the 
problem  into  an  estimation  part  and  a  quantization  part.  For 
the  former,  the  underlying  density  is  estimated  based  on  the 
causal  quantized  data,  and  for  the  latter  a  new  (optimal)  quan¬ 
tization  algorithm  (e.g.  Lloyd-Max)  is  designed  based  on  the 
estimated  density.  Here  we  concentrate  mainly  on  the  estima¬ 
tion  part. 

We  now  describe  a  statistical  estimation  model  which  lends 
itself  to  a  theoretical  analysis.  This  model  is  a  good  approx¬ 
imation  to  situations  where  the  quantization  levels  are  sta- 
blized,  and  these  levels  don’t  have  to  be  the  optimal  levels 
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based  on  the  unknown  density. 

Let  an  1,-level  quantization  of  [a,b]  (which  can  be  an  in¬ 
finite  interval)  correspond  to  an  (interval)  partition 
and  assume  we  only  observe  the  quantized  causal  past  data 
xN ,  i.e.,  we  observe  only  the  indicators  I{Xj€Ai}-  Denote  by 
m(N)  (i=l,...,  L)  the  counts  of  z’s  falling  into  intervals  At. 
Assume  the  source  process  is  stationary  and  ergodic  with  a 
k-dimensional  parametric  marginal  density  f(x\0)  (0  €  Rk)- 
Let  Pi  -  Pi(0)  :=  fA  f(x\0)dx.  Under  regularity  conditions 
on  the  parametric  family,  when  k  =  L,  the  above  equations 
uniquely  determine  6  in  terms  of  Pp s:  0  =  g(Pi,  Pl)- 
By  the  Ergodic  Theorem,  for  N  large,  m/N  »  P»;  hence 
0  g(ni/N,...,riL/N)  tends  to  the  true  0  as  N  gets  large. 

That  is,  quantized  sample  leads  to  consistent  estimation  of 
the  unknown  density  when  the  source  is  stationary  and  er¬ 
godic,  and  as  long  as  the  marginal  density  is  parametric  with 
dimension  less  than  the  level  of  quantization  -  “one  needs  at 
least  the  number  of  equations  as  the  number  of  unknowns.” 
(In  the  case  that  k  <  L,  we  solve  for  0  using  the  k  equa¬ 
tions  corresponding  to  the  k  largest  m/N ;  or  we  minimize 
Y^t(Pt(0)  -  nt/N)2.)  When  the  CLT  holds  for  the  stationary 
process,  the  asymptotic  normality  of  0  is  expected. 

If  we  further  assume  that  the  source  process  is  memoryless, 
then  maximum  likelihood  method  can  be  used  to  estimate  0 
based  on  m :  Omie  =  arg.  max.  Pt(0)n' .  In  particular,  the 
Monte-Carlo  EM  (Expectation-Maximization)  or  data  argu¬ 
mentation  algorithm  (cf.  Wei-Tanner,  1990)  can  be  used  to 
find  6 mie  if  we  view  the  unobserved  x’s  as  the  complete  data 
and  the  rips  as  the  observed  incomplete  data.  This  algorithm 
is  especially  useful  when  the  MLE  based  on  the  complete  data 
has  a  closed  form  such  as  in  the  case  of  the  Gauss  family. 
Moreover,  the  linear-interpolation  estimation  method  in  Or¬ 
tega  and  Vetterli  (1994)  can  be  viewed  in  our  framework  as 
follows:  let  0  —  (/(ai), ...,  /(a*)),  where  a’s  are  pre-chosen 
points  in  [a,  6],  for  example,  centers  of  AP s.  Then  /(-|0)  is  the 
density  determined  by  linearly  interpolating  the  /(a)’ s. 

Currently  under  investigation  are  non-parametric  estima¬ 
tion  methods  and  using  MDL  to  select  window  sizes  on  which 
the  causal  past  is  based.  Simulation  studies  are  also  planned 
to  test  the  estimation  methods  when  used  together  with  a 
quantization  algorithm  such  as  the  Lloyd-Max  algorithm. 
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Abstract  —  We  describe  an  efficient  probability 
quantization  scheme  for  binary  arithmetic  code  im¬ 
plementation.  We  show  that  this  scheme  is  simple  to 
implement  and  has  better  compression  efficiency  than 
some  existing  schemes. 

I.  Introduction 

The  binary  arithmetic  code  is  a  crucial  element  of  many  prac¬ 
tical  state-of-the-art  lossless  and  lossy  compression  schemes. 
The  key  to  an  efficient  implementation  of  the  binary  arith¬ 
metic  coding  procedure  is  to  avoid  performing  the  time- 
consuming  multipication  and  division  operations  in  the  prob¬ 
ability  update  for  each  binary  symbol  sent.  IBM’s  QM-coder 
[1]  keeps  track  of  two  fixed-length  registers  A  and  C,  where 
A  represents  the  size  of  the  current  interval,  and  C  indicates 
the  base  of  the  current  interval.  By  means  of  a  normaliza¬ 
tion  process  A  and  C  are  kept  within  a  specific  range.  By  a 
simple  approximation  that  requires  A  to  be  in  the  range  of 
0.75  <  A  <  1.5,  the  QM-coder  replaces  multiplications  with 
simple  additions  and  subtractions.  Using  a  binary  entropy  ar¬ 
gument,  the  worst-case  efficiency  can  be  shown  to  be  about 
97.0%.  Langdon  et.  al.  proposed  a  more  intuitive  approach  to 
perform  binary  arithmetic  coding  [2].  Langdon’s  binary  arith¬ 
metic  coding  procedure  keeps  track  of  two  values  high  and 
low ,  where  high  and  low  correspond  to  the  top  and  the  bot¬ 
tom  of  the  current  interval.  Langdon  suggested  to  constrain 
the  probability  of  the  less  probable  symbol  to  the  nearest  in¬ 
tegral  power  of  ^ ,  so  that  multiplications  can  be  replaced  by 
simple  shifts.  The  worst-case  efficiency  of  Langdon’s  binary 
arithmetic  code  can  be  shown  to  be  about  95.0%. 

II.  Probability  Quantization  Scheme 

In  this  article  we  improve  upon  Langdon’s  results  by  approx¬ 
imating  the  probability  of  the  less  probable  symbol  with  a 

fraction  of  the  form  2~l  or  2~l~ 1  -f  2~l~ 2  for  Z  =  1,2, _ It  is 

easy  to  show  that  multiplying  a  number  by  2~l~ 1  +  2~l~ 2  is 
equivalent  to  right-shifting  it  by  l  +  0.415  bits.  Computation¬ 
ally  this  correponds  to  replacing  a  multiplication  operation 
with  2  shifts  and  an  add.  As  we  will  show  later,  this  scheme 
improves  the  worst-case  coding  efficiency  to  98.5%. 

The  following  is  a  sketch  on  how  to  optimally  quantize 
the  probability  of  the  less  probable  symbol  to  achieve  the 
aforementioned  computational  efficiency.  We  use  a  similar 
approach  as  Langdon’s.  Let  p,  0  <  p  <  0.5,  be  the  true 
probability  of  the  less  probable  symbol.  The  question  is 
to  choose  a  step-wise  probability  quantization  function  Q(p) 
of  p  such  that  the  average  code  length  per  symbol,  namely 
plog2(Q(p))  -  (1  -p)  log2(l  -  Q(p))}  is  minimized.  The  design 
of  Q(p)  is  complexity-driven,  not  performance-driven.  How¬ 
ever  as  we  will  show  later,  that  we  do  not  sacrifice  much 

^his  work  was  carried  out  by  Jet  Propulsion  Laboratory,  Cali¬ 
fornia  Institute  of  Technology,  under  a  contract  with  National  Aero¬ 
nautics  and  Space  administration 


by  quantizing  p  into  the  form  2  1  or  2~l~ 1  -f  2~l~2  for 
l  —  1,2,  —  We  examine  two  different  cases. 

Case  1:  2~l~ 1  4-  2-/~2  <  p  <  2~l 

This  corresponds  to  finding  the  breakpoint  p'  such  that 

p'(l  +  0.41504)  -  (1  -  p ')  log2(l  -  2~l~1  -  2~l~2) 

=  p'Z-(l-p')k>g2(l-2-!) 


Case  2:  2~l~l  <  p  <  2~l~l  +  2~l~2 

This  corresponds  to  finding  the  breakpoint  p  such  that 

p’(l  +  1)  -  (1  -p')log2(l  -  2_i_1) 

=  p’{l  +  0.41504)  -  (1  -  p')  log2(l  -  2-'-1  -  2“'-2) 

We  use  the  same  performance  efficiency  definition  as  Lang¬ 
don’s,  which  is  given  by  the  entropy  as  a  fraction  of  the  aver¬ 
age  code  length, 


efficiency  = 


-pIog2P"  (1  -plog2(l  ~P) 
pQ(jp)  -  (1  -  p)  log2(l  -  2-300)  * 


We  tabulate  the  optimal  probability  range  and  the  worst-case 
efficiency  for  each  quantized  probability  value  (Figure  1). 
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Right-Sft 
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Efficiency 

0.437  -  0.500 

1 

0.5 

0.988 

0.310  -  0.437 

1.415 

0.375 

0.985 

0.218  -  0.310 

2 

0.25 

0.994 

0.155  -  0.218 

2.415 

0.1875 

0.991 

0.109-0.155 

3 

0.125 

0.996 

0.077-  0.109 

3.415 

0.09375 

0.994 

0.054  -  0.077 

4 

0.0625 

0.997 

0.039  -  0.054 

4.415 

^0.046875 

0.995 

0.027  -  0.039 

5 

0.03125 

0.998 

0.019  -  0.027 

5.415 

0.0234375 

0.996 

0.014  -  0.019 

6 

0.015625 
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Abstract  —  A  general  formula  is  given  for  the  MSE 
performance  of  affine  index  assignments  for  a  binary 
symmetric  channel  with  an  arbitrary  source  and  a  bi¬ 
nary  lattice  quantizer.  The  result  is  then  used  to  com¬ 
pare  some  well-known  redundancy  free  codes.  The  bi¬ 
nary  asymmetric  channel  is  considered  for  a  uniform 
input  distribution  and  a  class  of  affine  codes. 

Two  major  issues  in  noisy  channel  vector  quantization  are 
complexity  and  sensitivity  to  channel  errors.  Structured  vec¬ 
tor  quantizers  and  index  assingments  provide  a  low  complexity 
solution  for  enhancing  channel  robustness. 

A  d-dimensional,  n-bit  noisy  channel  VQ  with  index  set 
X  —  {0, 1, . . .  ,  2n  —  1},  and  code  book  C  =  {y»  €  Kd:  *  €  X } 
is  a  functional  composition  Q  =  P  o  tt"1  o  ^  o  tt  o  where 
£ :  Kd  — y  X  is  the  quantizer  encoder ,  T>:  X  — >  C  is  the  quantizer 
decoder ,  n:  X  — )■  X  is  the  index  permutation ,  and  rj:  X  -+  X  is 
a  random  permutation  representing  the  channel. 

A  binary  lattice  quantizer  is  a  vector  quantizer,  whose  code¬ 
vectors  are  of  the  form  y;  =  yo  +  Vl*1  f°r  *  €  where 

the  ordered  set  of  vectors  V  =  {v/}”^1  is  called  the  generating 
set ,  and  ii  €  {0, 1}  is  the  Zth  bit  in  the  binary  expansion  of 
the  index  i  (here  z0  is  the  LSB).  A  binary  lattice  quantizer  is 
equivalent  to  a  direct  sum  quantizer  (or  multistage  or  residual 
quantizer)  with  two  code  vectors  per  stage.  Examples  include 
truncated  lattice  vector  quantizers  (e.g.  uniform  quantizers). 
A  binary  lattice  VQ  is  similar  to  the  non-redundant  version 
of  the  LMBC-VQ  (VQ  by  Linear  Mappings  of  Block  Codes) 
presented  in  [3], 

An  affine  index  assignment  is  an  assignment  of  the  form 

tt(i')  =  iG®d,  7r_I(i)  =  (t®J)F,  (F  =  G-1) 

where  G  is  the  generator  matrix ,  d  is  the  translation  vector , 
and  the  operations  are  performed  over  GF( 2).  Many  popular 
redundancy  free  codes  are  affine,  including  the  Natural  Binary 
Code  (NBC),  the  Folded  Binary  Code  (FBC),  and  the  Gray 
Code  (GC). 

For  a  given  source  X,  the  Hadamard  transform  of  its  dis¬ 
tribution  is  defined  as  Pi  =  P  [£(X)  =  i]  (— 1)^. 

The  MSE  of  a  quantizer  that  satisfies  the  centroid  condi¬ 
tion,  can  be  decomposed  as  D  =  Ds  +  Dc,  where 

Ds  =  Yi  E  [IIX  -  y.  II2  I  *(X)  =  «]P  [£(X)  =  i] 

•ex 

Dc  =  -y>ll2P[£(X)  =  *]P  Wi)k(*)]- 

Theorem  1  The  channel  distortion  of  a  2"  point  binary  lat¬ 
tice  vector  quantizer  with  generating  set  {vjJjL^1,  which  uses 

^he  research  was  supported  in  part  by  the  National  Science 
Foundation  under  Grants  No.  NCR-92-96231  and  INT-93-15271. 


an  affine  index  assignment  with  generator  matrix  G  to  trans¬ 
mit  across  a  binary  symmetric  channel  with  crossover  proba¬ 
bility  q,  is  given  by 

n— 1 n— 1 

DC=4 

k-0  1=0 

(l  -  2(1  -  2 qr)"’(/^-*)  +  (1  -  2 g)u’</T-k®/T-i))  I 

where  tc(-)  denotes  Hamming  weight,  f1'  =  [/i,*,...  ,/n.fc]  w 
the  fcth  column  of  F  =  G_1,  Pi  is  the  /th  component  of  the 
Hadamard  transform  of  the  induced  discrete  distribution  on 
the  encoder  cells,  and  0  indicates  modulo  2  addition. 

Let  FBC*  denote  the  “best”  Folded  Binary  Code  obtained 
by  reordering  the  generating  set  V  to  minimize  Dc,  and  let  U 
be  a  uniform  discrete  random  variable  on  the  code  points. 

Corollary  1  Given  the  conditions  of  Theorem  1  (and  q  < 
1/2 j,  D(cfbc')  >  D{cBC)  if  and  only  if 

Var[Q(X)J  +  E2[Q(X)  -  U]  >  Var[U] 

Z^vgv  Nvil 

For  a  uniform  (discrete)  distribution  on  the  code  vectors, 
and  a  binary  symmetric  channel  the  NBC  is  the  optimal  index 
assignment  [1],  [2].  An  affine  translate  of  the  NBC  is  an  index 
assignment  of  the  form  7r(i)  —  i  0  d  =  7r  1  (t) . 

Theorem  2  If  a  2n  point  binary  lattice  vector  quantizer 
induces  equiprobable  encoder  cells  for  a  given  source,  and 
transmits  an  affine  translation  of  the  Natural  Binary  Code 
across  a  binary  asymmetric  channel  with  crossover  probabili¬ 
ties  P  [1|0]  =  p  and  P  [0|1]  =  q,  then  the  channel  distortion  is 
minimized  if  and  only  if  the  translation  vector  d  satisfies 

d  =  argmin  l|y.-E[U]|| 
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Abstract  — A  quantizer  design  algorithm  for  trans¬ 
mission  over  finite  state  channels  is  presented.  Opti¬ 
mal  design  algorithms  for  a  variety  of  conditions  re¬ 
garding  the  knowledge  of  the  state  information  at  the 
transmitter  and  the  receiver  are  derived.  Both  cases 
of  noiseless  and  noisy  observations  are  considered. 

I.  Summary 

We  want  to  transmit  the  output  of  an  information 
source  to  a  receiver  over  a  finite  state  channel  with  two 
states.  In  general,  the  entropy  rate  of  the  source  is  too 
high,  and  therefore  we  need  to  quantize  the  source  out¬ 
put  to  make  it  suitable  for  transmission.  Our  objective 
is  to  design  the  quantizer  to  minimize  the  mean  squared 
error  (MSE)  when  the  channel  is  in  state  Si,  subject  to 
a  constraint  on  the  MSE  when  the  channel  is  in  state  S2. 
In  other  words,  the  problem  is  to  minimize 

Di  —  E[(X  —  X) 2 (channel  state  is  Si] 

subject  to  D2  -  E[(X—X ) 2 1 channel  state  is  S2]  <  D 

where  X  is  the  source  output,  X  is  the  reconstructed 
output  at  the  receiver,  and  D  is  the  maximum  allowable 
distortion  when  the  channel  state  is  S2. 

Let  Pm(k\i)  be  the  probability  of  receiving  k  as  the 
channel  output  when  the  channel  input  is  i  and  the  chan¬ 
nel  state  is  Sm,  where  m  —  1,2,  i  E  {1,  2, ...,  Ai}  and 
k  E  {1,  2, ...,  N2}.  We  also  assume  that  noisy  state  infor¬ 
mation  is  available  both  at  the  transmitter  and  the  re¬ 
ceiver.  Let  tji  and  rjj  be  the  probability  that  state  Si  is 
perceived  as  state  Sj  at  the  transmitter  and  the  receiver, 
respectively.  Denote  the  ith  quantization  region  by  Ami 
when  the  channel  state  is  perceived  as  Sm  at  the  trans¬ 
mitter,  and  the  kth  reconstruction  level  by  gn(k)  when 
the  channel  state  is  perceived  as  Sn  at  the  receiver. 

To  design  the  quantizer,  our  approach  is  to  convert  the 
constrained  optimization  problem  to  an  equivalent  un¬ 
constrained  minimization  problem  by  the  method  of  La¬ 
grange  multipliers,  i.e.,  to  minimize  L  =  Di+\(D2-D), 
A  >  0  is  a  constant.  By  optimizing  the  encoder  structure 
for  a  fixed  decoder  and  the  decoder  structure  for  a  fixed 
encoder,  we  obtain  the  necessary  conditions  for  the  op¬ 
timality  of  the  quantizers.  This  results  in  the  following 
algorithm  for  the  quantizer  design  as  derived  in  [1]. 

Algorithm:  Optimal  quantizer  design  to  minimize  L  for 
a  fixed  A. 

1)  Start  with  an  initial  encoder  structure. 

1This  work  was  partially  supported  by  the  NSF  Grant  NCR- 
9101560 


2)  Find  the  optimal  decoder  structure  for  the  current 
encoder  structure  by  using 

9n{k)  —  i?  [A  |  channel  output  is  k 

and  channel  state  is  Sn ]  (1) 

for  all  k  E  {1,  2, ...,  A2},  n  =  1,2. 

3)  Find  the  optimal  encoder  structure  for  the  current 
decoder  structure  by  using 

Ami  —  •  2(lmj£  fimi  ^  2 Qim{iX  j 

^*€{1,2,...,^}}  (2) 

where 

22  n2 

ami  ~  ^2  ^  tmj  rnj  Pj  (k\i)gn(k), 
j— 1  n  =  l  k= 1 

2  2  JV2 

@mi  =  ^2  ^  rnJ  ^2  OtfnW 

j=  1  n  =  l  k  —  1 

*€{1,2,...,^},  m  =  1,2,  A,  =  1,  A2  =  A. 

4)  If  the  change  in  L  is  below  a  prespecified  threshold, 
stop.  Otherwise,  go  to  step  2. 

The  Lagrangian  is  non-increasing  at  each  step  and  is 
bounded  from  below,  therefore  the  algorithm  is  always 
convergent.  A  numerically  efficient  algorithm  to  find  Ami 
in  (2)  is  presented  in  [2]. 

In  order  to  complete  the  solution  of  the  problem  one 
has  to  vary  the  Lagrange  multiplier  (A  >  0),  apply  the 
design  algorithm,  obtain  a  set  of  achievable  distortion 
pairs,  and  convexify  these  points.  The  last  step  is  justified 
by  the  use  of  time-sharing.  In  [1]  it  is  illustrated  that,  in 
general,  time-sharing  is  necessary  to  obtain  the  optimal 
performance  of  the  system.  A  set  of  numerical  examples 
where  the  above  algorithm  is  employed  is  also  presented. 

We  also  consider  the  quantizer  design  problem  when 
the  observation  is  noisy.  It  is  shown  that  the  problem 
of  optimal  quantizer  design  for  noisy  observations  can 
be  separated  into  two  parts  —  first  estimating  X  from 
the  observation  in  the  MSE  sense,  and  then  using  the 
quantizer  design  algorithm  for  no  observation  noise. 
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Abstract  —  The  probabistic  analysis  of  the  adaptive 
quantizer  DH  is  presented.  For  the  case  when  the 
number  of  quantizer  levels  is  equal  to  4,  Mean  square 
error  -  Average  Entropy  functions  are  calculated. 

I.  Introduction 

The  Adaptive  DPCM  ( ADPCM )  has  been  recommended  by 
CCITT  [1]  to  implement  in  communication. 

The  exact  mathematical  analysis  of  the  APCM  systems  is 
rather  sophisticated  [2]  and  therefore  there  is  not  the  complete 
analysis  of  any  of  them.  In  this  paper,  it  is  presented  the 
complete  probabilistic  analysis  of  the  simple  variant  of  the 
adaptive  quantizer  DH  [3]  as  well  as  the  comparison  with 
Max’s  quantizer  [4]. 

II.  Main  Performance 

Consider  the  adaptive  uniform  quantizer  DH  with  N  —  4 
quantizer  levels,  the  variable  size  of  the  quantizer  step  h ,  and 
the  variable  size  of  the  quantizer  range  d.  The  step  and  the 
range  at  the  sampling  instant  tk+i  depends  on  their  values  and 
a  value  of  an  input  signal  at  the  preceding  sampling  instant 
tk  (see,  for  details,  [3]).  The  adaptive  quantizer  is  equivalent 
to  the  two  virtual  quantizers  with  steps  h\  =  h  and  /12  =  2 h, 
respectively.  The  first  quantizer  is  used  for  the  small  values 
of  the  input  signal,  and  the  second  one  is  used  for  the  large 
values. 

The  adaptive  quantizer  is  designed  to  reduce  the  value  of 
the  entropy  of  quantized  signal  for  a  given  error  in  comparison 
with  Max’s  nonadaptive  quantizer  [4]. 

Main  performance  is  the  Mean  square  error  -  Average  En¬ 
tropy  function.  In  [3],  it  is  shown  that  the  joint  probability 
distribution  of  the  input  signal  and  parameters  Wi(y)  (i  =  1,  2) 
is  given  by  the  equations 

m(y)  =  /  (wi(x,y)+w2[x,y))dx\  w2{y)  =  w(y)-w1(y), 

J\x\<hi 

where  w(y)  is  one-dimensional  probability  density  of  the  input 
signal,  Wi(x,y)  is  the  joint  probability  of  the  samples  x,y  and 
parameters  i  =  1,2.  Put  hi  =  /i,  /i2  =  2 h. 

For  this  case  the  solution  of  the  equations  is  as  follows: 


H  =  - 2  f0h  w(y)fi {y)dy log(l/ Qi  f*  w(y)fi(y)dy) 

-2  /°°  w(y)fi(y)dy\og{l/Qi  w(y)fi(y)dy) 

-2  fQ  h  w(y)f2(y)dylog(l/Q2  //'  w{y)f2{y)dy) 

-2  w(y)f2{y)dy  log(l /Q2  f2h  w{y)f2(y)dy), 

where 

w{y)  =  exp{-y2 /2)V2n-, 

My)  =  i/v/27r/(^py)()/T^exp(-22/2)^;  My)  = 1  -  My), 

-(h+f>y)/y/l-p2 

Q i  =  flxl<h  w(x)dx ;  Q2  =  1  -  Q i. 

III.  Numerical  Results 

Main  performance,  i.e.,  the  Mean  square  error  -  Average  en¬ 
tropy  function,  is  presented  in  the  Fig.  for  the  nonadap¬ 
tive  quantizer  (1),  and  for  the  adaptive  quantizer  and  p  = 

0,  0.9,  0.99,  1(2,  3,  4,  5).  One  can  see  that  In  the  region  where 
the  error  is  minimal  the  value  of  the  error  changes  slowly  with 
H.  Hence,  we  can  get  an  extra  gain  in  data  compression  if 
we  use  suboptimal  values  of  h.  For  example,  if  it  is  allowed 
to  have  the  error  e2  =  0.119,  which  is  minimal  for  the  non¬ 
adaptive  quantizer,  then  we  can  get  the  gain  about  0.62  bit 
per  sample  for  large  p. 


wi (y)  =  /  w(x,y)dx;  W2(y)  ~ 

J\x\<h 

where  w(x,y)  is  two-dimensional  probability  density  of  the 
input  signal. 

We  consider  the  Gaussian  input  signal  with  zero  mean  [2] 
value,  variance  1  and  correlation  coefficient  between  two  adja¬ 
cent  samples  p.  For  this  case  ,  the  average  mean  square  error 
and  the  average  entropy  can  be  rewritten  as  follows:  [3] 

e2  =  2  Jq(v  -  h/2)2w(y)f1(y)dy  +  2  f™(y  -  3h/2)2w(y)fi{y)djfl 

+2  Joh(y  -  h)2w(y)My)dy  +  2 /2~(y  -  Zh)2w{y)f2{y)dy, 


/ 

J  la 


w{x,y)dx\ 


|z|>/i 


[i] 
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I.  Introduction 

Mobile  telephony  in  CDMA  channels  encounters  a  variety  of 
communication  challenges  including  fading  due  to  multipath 
(MP)  and  multiaccess  interference  (MAI)  due  to  simultane¬ 
ous  transmissions  from  interfering  users.  Detectors  which  em¬ 
ploy  multiuser  detection  and  temporal  (RAKE-type)  combin¬ 
ing  have  been  shown  [1]  to  provide  near-far  resistant  solutions 
which  effectively  combat  both  of  these  impediments.  In  the 
first  part  of  this  paper,  we  address  the  potential  gains  of  us¬ 
ing  spatial  combining  in  conjunction  with  multiuser  detection 
and  temporal  combining.  In  the  second  part  of  the  paper,  we 
examine  an  adaptive  multiuser  detector  which  is  well  suited 
for  MAI- limited  MP  channels. 

II.  Multiuser  Array  Detection  for  Multipath 
Channels 

Recently,  efforts  have  been  made  to  combine  the  use  of  tem¬ 
poral  combining  and  spatial  combining.  Most  of  these  efforts 
are  based  on  conventional  detection  schemes  which  have  been 
shown  to  be  near-far  limited.  In  the  first  part  of  this  paper,  we 
combine  results  from  [1]  and  [2]  to  derive  a  class  of  near-far 
resistant  detectors  which  uses  a  linear  multiuser  detector  in 
conjunction  with  spatial  and  temporal  combiners.  It  is  shown 
that  the  optimum  (in  terms  of  near-far  resistance)  linear  mul¬ 
tiuser  detector  with  an  array  of  P  sensors  consists  of  a  bank 
of  match  filters  at  each  sensor  matched  to  the  users’  delayed 
spreading  codes,  followed  by  a  spatial  combiner  (which  acts 
as  a  beamformer  pointing  in  the  direction  of  each  users’  MP 
signals),  a  temporal  combiner  (which  coherently  combines  a 
user’s  MP  components),  and  a  linear  transformation  which 
decorrelates  the  users.  Since  this  decorrelation  process  re¬ 
lies  on  the  estimates  of  the  signals’  spatial  and  MP  parame¬ 
ters,  this  detector  (known  as  the  spatial-temporal  decorrelator 
(stD))  is  near-far  limited  when  the  estimates  are  not  exact.  By 
interchanging  the  order  of  the  three  processors,  we  can  obtain 
two  suboptimum  detectors,  the  sDt  and  Dst,  which,  respec¬ 
tively,  retain  their  near- far  resistant  characteristics  when  there 
is  MP  parameter  mismatch  and  when  there  is  both  MP  and 
spatial  parameter  mismatch.  If  all  of  the  system  parameters 
are  known  exactly,  we  have  the  following  relationship  among 
their  respective  bit  error  rates  as  a  function  of  the  noise  level: 
PatD(cr)  <  <  PDst(& )■  This  result  is  illustrated  in 

Figure  1  for  a  2-user  synchronous,  coherent  system  where  each 
user  contributes  L  =  2  MP  components  and  where  there  are 
P  =  2  sensors. 

III.  Blind  Adaptive  Detection  for 
Multidimensional  Signals 

Motivated  by  the  need  for  a  noncoherent  multiuser  detector 
for  MP  channels  which  has  no  a  priori  knowledge  of  the  in¬ 
terfering  users,  in  the  second  part  of  the  paper,  we  derive  an 
extension  of  the  blind  adaptive,  detector  [3]  for  differentially 
encoded,  multidimensional  signals.  Such  a  detector  is  ideally 
suited  for  MP  channels  since,  if  we  assume  negligible  ISI,  the 


spanning  set  for  the  multidimensional  subspace  is  given  by  the 
truncated,  delayed  translates  of  the  desired  user’s  spreading 
code.  Given  the  Z-dimensional  subspace  in  which  the  desired 
user’s  signal  lies,  we  can  obtain  an  arbitrary  orthogonal  basis 
Zi  . . .  z L'  The  resulting  detector  consists  of  a  bank  of  L  linear 
filters  followed  by  an  inner-product  operation  between  the  cur¬ 
rent  filter  bank  output  and  that  from  the  previous  bit  interval; 
the  bit  estimate  is  the  hard-limit  of  this  inner-product.  The 
Ith  filter  consists  of  a  real  part  (zi  +xf  )/\\zi  -j-xf*||,  l  =  1 . . .  L 
which  operates  on  the  real  part  of  the  received  signal  and  a 
corresponding  imaginary  part.  The  xf  and  xf  are  each  con¬ 
strained  to  be  orthogonal  to  all  of  the  basis  vectors  Zi  . .  ,zl 
and  are  obtained  adaptively  using  the  output  energy  of  the 
respective  real  and  imaginary  part  of  the  Ith  filter.  Each  of 
the  xf*  and  xf  can  be  adapted  independently,  exhibits  global 
convergence,  and  requires  knowledge  of  only  z\  and  the  timing 
(bit-epoch)  of  the  desired  user.  Hence  this  detector  requires 
even  less  knowledge  than  the  conventional  RAKE  receiver;  yet 
as  seen  in  Figure  1,  for  a  2-user  system  with  L  —  2,  it  essen¬ 
tially  achieves  the  same  performance  as  the  optimum  linear 
MP  multiuser  detector  (equivalent  to  the  differentially  coher¬ 
ent  stD  with  P=l). 


Figure  1:  Performance  of  Multiuser  Detectors  ( L  —  2) 
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Summary 

Multi-user  communication  scenarios,  to  date,  have 
almost  exclusively  focussed  on  the  situation  in  which  all 
users  share  a  common  symbol  rate  1/7.  However,  future 
multi-media  services  will  require  that  users  with  different 
(and  possibly  time-varying)  data  rates  share  a  common 
transmission  channel.  We  consider  the  problem  of  opti¬ 
mizing  a  multi-user  receiver  when  the  users  transmit  with 
different  symbol  rates.  The  problem  of  optimizing  the 
transmitter  pulse  shaping  filters  for  each  user  assuming 
different  transmitted  symbol  rates  is  also  considered.  We 
assume  the  Minimum  Mean  Squared  Error  (MMSE)  per¬ 
formance  criteria. 


common  multiple  of  the  elements  of  the  vector  argument. 
It  is  convenient  to  think  of  the  additional  streams  created 
by  this  process  as  ‘fictitious’  new  users  in  the  system.  This 
procedure  effectively  yields  an  equivalent  higher  dimen¬ 
sional,  single-symbol-rate  multi-input,  multi-output  com¬ 
munication  system  with  input  dimension  (corresponding 

*  LCM (m) 

to  the.  total  number  of  ‘users’)  equal  to 


*=i 


mk 


Note  that  for  the  case  mk  =  1  for  all  k ,  this  reduces  to 
multi-user  communication  with  identical  symbol  periods. 
Accordingly,  the  vector  of  channel  transfer  functions 
which  corresponds  to  the  embedded  system  with  equal 
symbol  rates  is  given  by 


The  £th  user  generates  a  sequence  of  pulses 
sk(t)  =  'Lbk[i]S(t-iTk) 

i 

where  {bk[i]}  is  the  sequence  of  symbols  corresponding 
to  user  k,  and  1/7*  is  user  k's  symbol  rate.  This  signal  is 
the  input  to  a  pulse  shaping  filter  with  transfer  function 
Pk(f).  The  channel  corresponding  to  user  k  is  //*(/),  and 
the  additive  noise  n(t)  is  assumed  to  be  white.  The 
received  signal  is  therefore 

y(t)  =  x  L  bk[i]{pk  *  hk(t -  iTk)}  +  n(t) 

k= 1  i 

where  K  is  the  number  of  users,  and  pk  *  hk  is  the  con¬ 
volution  of  the  transmitted  pulse  shape  with  the  channel 
impulse  response.  We  will  assume  that  there  exist  non¬ 
negative  integers  such  that 

Tl:T2:  -  :TK  =  m{:m2\  ■ :  mK,  where 

mx  <  m2  <■■<  mK,  implying  Tx  >  72  >  >  TK. 

For  systems  with  multiple  rates,  the  optimum 
receiver  is  periodically  time-varying ,  due  to  the  underly¬ 
ing  cyclostationarity  of  the  sampled  received  signal.  The 
approach  we  take  is  to  embed  the  optimum  receiver  design 
problem  into  an  equivalent  higher-dimensional  problem 
that  is  wide-sense  stationary.  To  do  so,  we  ‘decompose’ 
the  input  data  stream  from  user  k  into  LCM(m)fmk  low- 
rate  streams  each  with  a  common  symbol  period 

^  LCM  (m)  ^ 

l  „ - i  k 

mk 

where  m  =  (mj, . . . ,  mK),  and  LCM(  )  denotes  the  least 


H(/)  = 


•  •  • ;  HK(f)e^,. . . ,  HKeJ2*f,nT« 


where  m  =  LCM{rn)lmK  -  1. 

We  also  consider  the  optimization  of  the  transmitter 
pulses  p\(t), . . . ,  Pk(0  subject  to  the  power  constraints 

oo 

J  \Pk{f)\2df  <  II*,  k=l,...,K,  assuming  a  linear 


MMSE  receiver.  Using  the  preceding  decomposition  tech¬ 
nique,  the  problem  can  again  be  embedded  in  a  higher¬ 
dimensional  problem  in  which  the  users  transmit  with  the 
same  symbol  rate.  Necessary  conditions  for  optimality 
can  be  derived,  and  show  that  FDMA  achieves  a  local 
optimum  (which  may  be  globally  optimal).  The  FDMA 
solution  differs  from  that  given  in  [1],  in  that  for  a  particu¬ 
lar  frequency  0  <  /  <  l/(2mi7]),  user  1  (user  2)  can  place 
power  at  up  to  mx  (m2)  different  Nyquist  translates,  (that 
is,  /  +  il{m\T\)  for  different  nonnegative  integers  i). 

Numerical  results  will  be  presented  that  illustrate 
the  tradeoff  between  changing  symbol  rates  and  changing 
the  number  of  constellation  points  to  achieve  a  given  mix 
of  data  rates. 
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Abstract  —  Receiver  structures  based  on  joint 
MMSE  diversity  combining,  equalization  and  multi¬ 
ple  access  interference  suppression  are  discussed.  It 
is  shown  that  receiver  complexity  can  substantially  be 
reduced  by  exploiting  the  structure  of  multipath.  Ex¬ 
perimental  results,  obtained  in  an  underwater  acous¬ 
tic  channel,  demonstrate  superior  capabilities  of  the 
receivers  proposed. 

I.  Introduction 

Due  to  their  superior  performance,  multiuser  receivers 
are  being  considered  for  applications  ranging  from  wideband 
CDMA  systems  to  bandwidth-efficient  multiple-access  under¬ 
water  acoustic  (UWA)  communication  channels  [1],  [2].  In 
severely  dispersive  time- varying  channels,  multipath  propaga¬ 
tion  presents  a  major  limitation  to  the  system  performance.  In 
such  a  case,  multisensor  signal  processing  offers  potentials  of 
robustness  to  fading,  reduction  of  residual  intersymbol  inter¬ 
ference  (ISI)  [3]  and  suppression  of  multiple- access  interference 
(MAI). 

II.  Receiver  Structure 

We  address  the  general  case  of  a  multipoint-to-point  com¬ 
munication  system  where  multiuser  signals  are  subject  to  ISI 
and  may  overlap  in  both  time  and  frequency.  Assuming  the 
presence  of  L  users  in  a  system  with  K  receiving  elements,  the 
optimal  receiver  consists  of  a  combiner  followed  by  a  sequence 
detector,  as  shown  in  Fig.l.  The  Ith  combiner  is  optimally 


Figure  1:  Optimal  receiver. 

represented  as  a  bank  of  K  matched  filters  whose  outputs  are 
summed  and  sampled  at  the  symbol  rate.  The  L  discrete-time 
combiner  outputs  are  processed  by  an  L  x  L  detector,  chosen 
as  a  MIMO  DFE  [4].  When  the  channel  is  not  known,  the 
combiners  are  realized  as  banks  of  fractionally  spaced  adap¬ 
tive  filters. 

III.  Reduced-Complexity  Adaptive  Processing 

Although  the  use  of  an  equalizer  eliminates  the  exponen¬ 
tial  complexity  of  the  optimal  (MLSE)  detector,  the  resulting 
combiner/equalizer  structure  may  still  have  complexity  pro¬ 
hibitively  high  for  many  practical  cases.  Besides  the  increase 
in  computational  time,  a  critical  disadvantage  of  large  adap¬ 
tive  filters  lies  in  their  high  noise  enhancement,  which  ulti¬ 
mately  limits  the  gain  obtained  by  increasing  the  number  of 


input  channels.  These  issues  motivate  the  search  for  a  differ¬ 
ent  combining  strategy  in  which  the  size  of  the  combiner  will 
be  reduced,  but  multichannel  processing  gain  preserved. 

By  modeling  the  channel  as  consisting  of  a  finite  number 
of  propagation  paths,  it  is  revealed  that  the  optimal  com¬ 
biner  can  equivalently  be  realized  using  fewer  matched  filters. 
The  resulting  adaptive  combiner  is  shown  in  Fig. 2.  The  pre- 


Figure  2:  Reduced- complexity  adaptive  combiner. 

combiners  Cj  perform  spatial  processing  only,  reducing  the 
number  of  channels  from  K  to  P  <  K  for  subsequent  multi¬ 
channel  equalization.  Shown  also  is  the  multichannel  phase- 
locked  loop  which  is  an  essential  part  of  a  practical  receiver. 

When  the  multipath  structure  is  not  known,  the  approach 
most  beneficial  is  to  conduct  unconstrained  optimization  of 
the  combiners  and  the  equalizers.  To  preserve  performance  of 
the  full-complexity  receiver,  the  pre-combiners  and  the  mul¬ 
tichannel  equalizers  need  to  be  optimized  jointly.  An  adap¬ 
tive  algorithm  suitable  for  application  in  rapidly  time- varying 
channels  is  a  combination  of  the  second-order  gradient  up¬ 
dates  for  the  carrier  phases,  and  a  multiple  RLS  updates  for 
the  coefficients  of  the  combiners  and  the  equalizers. 

The  methods  described  above  were  applied  to  the  real  data 
obtained  from  experiments  conducted  in  the  shallow  water 
acoustic  channel,  characterized  by  rapidly  time-varying  ISI 
which  extends  over  several  tens  of  symbol  intervals.  Due  to 
the  bandwidth  limitation  of  the  channel,  only  very  low  spread¬ 
ing  ratios  can  be  used  (e.g.,  3),  resulting  in  increased  MAI. 
The  proposed  techniques  demonstrated  superior  performance 
in  such  conditions. 
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Abstract  —  A  flexible  iterative  receiver  is  proposed 
for  the  multiple  access  channel.  The  receiver  splits 
the  detection  problem  into  a  “single  user”  decoding 
step  followed  by  a  combining  step.  The  structure  of 
the  receiver  is  suitable  for  many  different  types  of 
multiple  access  channels. 

I.  Introduction 

Motivated  by  the  need  for  reduced  complexity,  the  success  of 
iterative  methods  for  decoding  of  concatenated  codes  [1],  and 
recent  theoretical  results  concerning  successive  cancellation  re¬ 
ceivers  [2],  we  propose  an  iterative  detector  for  co-operative 
detection  of  multiuser  systems,  which  for  reasons  that  will 
become  apparent,  we  have  named  the  consensus  decoder. 

The  detector  divides  the  detection  operation  into  two  parts 
-  a  “single  user”  estimation  step  (in  which  soft  decisions  are 
produced),  and  a  “multiuser”  combining  step. 

We  shall  consider  a  general  m-user  multiple  access  system, 
in  which  user  i  transmits  X% ,  drawn  independent  of  other  users 
from  a  finite  alphabet,  Xt,  i  =  1,2, . . . ,  m;  according  to  the 
distribution  pi(xi).  The  channel  produces  output  symbols,  Y, 
members  of  the  alphabet  Y,  according  to  transition  probabil¬ 
ities,  p(y  |  a?i,  x2, . .  • ,  £m). 

II.  The  Consensus  Detector 

Consider  an  m-user  system.  The  operation  is  as  follows. 
User  i  adds  redundancy  to  its  source  data,  Ui ,  via  an  en¬ 
coder,  producing  Xt.  We  shall  restrict  each  Xt  to  be  drawn 
from  an  identical  alphabet,  X .  Without  loss  of  generality, 
denote  the  members  oi  X  =  {0, 1, . . . ,  J  —  l}.  The  chan¬ 
nel  outputs  Y  E  y,  according  to  some  transition  probability, 
p(Y  |  X\,  A2, . .  - ,  Xm).  We  shall  denote  the  output  alphabet 

y  =  { o ,  l, . . . ,  K  i}. 

The  detector  operates  as  follows. 

1.  User  i  attempts  to  estimate  Xx  given  Y,  treating  other 
users  as  noise.  At  each  symbol  interval,  each  single  user 
detector  outputs  soft  information,  p.  which  is  a  vector 
of  probability  estimates  for  each  channel  input  symbol. 
p.  =  [P(X,  =  0\  Y),...,P{Xi  =  J  -1  |  Y)] 

2.  The  symbol  estimator  for  user  i  forms  a  list  of  possi¬ 
ble  channel  outputs,  y .  due  to  the  other  m  —  1  users. 
This  can  be  interpreted  as  an  estimate  of  the  channel, 
treating  the  other  users  as  part  of  the  channel.  Each 
element  of  y  has  associated  with  it  a  probability,  which 
is  determined  from  p  . 

-i 

3.  The  detector  for  user  i  now  estimates  Xt  given  Y  and 

the  list  of  possible  channel  outputs,  once  again  out- 
putting  soft  information. 

p.  =  [P  (Xi  =  0  |  Y,y.)  ,. . .  ,P  (X,  =  J  -  1  |  y,j/,)] 

1  Supported  in  part  by  Telecom  Australia  under  Contract 
No. 7368  and  by  the  Commonwealth  of  Australia  under  Interna¬ 
tional  S  &  T  Grant  No. 56. 


4.  Steps  2  and  3  are  now  repeated  as  many  times  as  de¬ 
sired. 

This  procedure  separates  the  detection  into  a  single  user  step 
(Step  3)  and  a  combining  step  (Step  2). 

In  practice,  it  is  impossible  for  the  symbol  estimator  to 
form  the  full  list  of  possible  channel  inputs,  since  there  will  in 
general  be  K771  combinations.  For  example  a  10  user  system 
with  8  channel  input  symbols,  there  are  already  about  1  billion 
possibilities.  Therefore,  we  shall  only  keep  the  L  most  likely 
symbols,  which  can  be  found  with  a  simple  Af-algorithm.  This 
is  where  the  system  complexity  is  reduced. 

The  final  output  of  the  detector  is  in  a  sense  the  set  of  se¬ 
quences  to  which  the  m  detectors  have  “agreed”  to,  hence 
the  name  consensus  detector.  It  is  also  simple  to  include 
“confidence  levels”  in  particular  users  as  follows.  If  we  de¬ 
fine  a  parameter,  <7,,  to  be  the  confidence  we  have  in  user 
i,  0  denoting  no  confidence  and  1  denoting  complete  confi¬ 
dence,  we  adjust  the  probability  estimates  from  a  particular 
user  p*  =  Cip  -f  (1  —  Ci)u  where  u  is  the  uniform  distribution, 
P  (Xt  =  j)  =  j,  for  all  0  <  j  <  J  —  1.  This  has  the  effect 
that  as  we  have  less  faith  in  a  particular  user,  we  flatten  out 
their  distribution,  placing  more  uncertainty  (entropy)  in  their 
decision. 

III.  Discussion 

The  advantages  of  the  proposed  detector  are  as  follows. 

The  complexity  of  the  system  may  be  easily  varied  to  pro¬ 
vide  different  levels  of  performance.  Simulation  results  have 
shown  that  in  practice  only  L  10  likely  symbols  need  to 
be  retained.  The  number  of  iterations  can  also  be  varied.  In 
general  at  most  4  —  5  iterations  are  required,  usually  less.  The 
reduced  complexity  nature  of  the  detector  gives  a  complexity 
that  increases  only  linearly  with  the  number  of  users. 

The  structure  of  the  consensus  detector  is  suitable  for  use 
with  many  multiple  access  channels.  All  that  is  required  is  a 
suitable  single  user  detector  to  perform  step  3  of  the  algorithm. 

The  system  may  be  biased  according  to  previous  knowledge 
about  the  users,  for  example  different  power  levels,  through 
the  use  of  confidence  levels. 
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Abstract  —  An  adaptive  near-far  resistant  technique 
for  the  blind  joint  multiuser  identification  and  detec¬ 
tion  in  asynchronous  CDMA  systems  is  analyzed  in 
fading  and  dispersive  GSM  channels. 

I.  Introduction 

Multiuser  detection  in  CDMA  systems  usually  requires 
either  knowledge  of  the  transmitted  signature  sequences 
and  channel  state  information  or  use  of  a  known  train¬ 
ing  sequence  for  adaptation.  Consequently  blind  adap¬ 
tive  multiuser  receivers  have  gained  considerable  atten¬ 
tion  [1].  We  recently  proposed  a  joint  multiuser  deconvo¬ 
lution  scheme  [2]  characterized  by: 

•  No  knowledge  of  timing,  channel  state  information 
or  signatures  nor  use  of  training  sequences  is  re¬ 
quired  for  any  user. 

•  The  estimate  of  the  signature  sequence  of  each 
user  convolved  with  its  physical  channel  impulse  re¬ 
sponse  is  provided  after  initial  convergence. 

•  The  blind  multiuser  detector  is  near- far  resistant. 

The  purpose  of  this  paper  is  to  further  investigate  the 
behavior  of  this  scheme  in  fading  and  dispersive  channels. 

II.  System  Model 

We  consider  the  asynchronous  CDMA  channel 
K 

r(t)  =  bk  [n]hk(t  -  nT,  t)  +  crvj(t)  (1) 

n  k  =  l 

where  hk(t  —  nT,t)  is  the  overall  complex  channel  im¬ 
pulse  response,  given  by  the  convolution  of  the  signature 
sequence,  physical  radio  channel  and  the  receiving  filter 
impulse  responses.  It  incorporates  the  amplitude  and  the 
delay  for  user  k ,  and  its  duration  is  assumed  to  be  smaller 
or  equal  to  L  symbols,  i.e.  hk(r,  t)  =  0,  r  <  0,  r  >  AT,  VI 
The  total  number  of  active  users  is  K  and  their  trans¬ 
mitted  sequences  are  binary  independent  symbols  bk[n]  £ 
{1,-1}.  The  symbol  rate  is  1/T  and  w(t)  is  normalized 
white  Gaussian  noise.  The  CDMA  channel  is  sampled  at 
a  rate  M/T  =  l/Ts  to  derive  the  vector  sequence  r[n] 

r[n]  =  [r(nT),r(nT+Ts r(nT  +  (M  -  1)Ts)]t  .  (2) 

The  observation  r[n]  is  modeled  as  a  probabilistic  M 
length  vector  sequence  of  a  state  vector  s  [n] 

r[n]  =  H[n]s[n]  -f  w[n]  ,  (3) 

1Work  supported  by  CIRIT  of  Catalonia  (GRQ93-3021). 


where  (M  x  KL)  matrix  % [n\  depends  of  the  overall  dis¬ 
crete  impulse  response  for  all  users  and  w [n]  is  the  nor¬ 
malized  noise  vector.  There  are  N  =  2LK  possible  state 
vectors  corresponding  to  A  binary  symbols  of  K  users. 

III.  Blind  Identification  and  Detection 
Algorithm 

If  the  overall  impulse  response  for  each  user  was  known, 
that  is  if  the  signature  sequence,  physical  channel  im¬ 
pulse  response,  amplitude  and  delay  corresponding  to 
each  user  were  available,  then  using  this  information,  the 
Viterbi  algorithm  could  be  employed  to  determine  the 
multiuser  maximum-likelihood  transmitted  sequence.  In 
the  method  we  presented  however,  the  Viterbi  algorithm 
is  applied  with  current  estimates  of  the  overall  impulse  re¬ 
sponses  which  are  updated  recursively  after  arbitrary  ini¬ 
tialization.  The  number  of  users  (A")  is  assumed  known 
together  with  a  bound  for  the  impulse  response  dura¬ 
tion  (A).  A  similar  approach  was  proposed  for  the  blind 
equalization  of  single  user  channels  using  the  Viterbi  al¬ 
gorithm  [3]  and  the  Baum- Welch  identification  algorithm 
[4].  Specific  to  the  multiuser  approach  is  the  procedure 
which  overcomes  the  convergence  to  a  local  minimum  [2] . 

IV.  Behavior  in  Fading  and  Dispersive  Channels 
The  blind  multiuser  algorithm  has  been  tested  using  the 
mobile  radio  channel  model  for  typical  urban  areas  (Type 
1)  TUX60,  as  defined  in  [5].  Simulations  indicate  that, 
for  moderate  Doppler  frequency  (50  Hz)  and  multipath 
spread  (1.35  symbols),  convergence  can  still  be  attained 
within  few  hundred  symbols.  Afterwards,  the  algorithm 
is  still  able  to  track  slow  channel  variations.  Possible 
modification  of  the  receiver,  after  the  initial  convergence, 
may  include  a  simpler  decision-directed  MMSE  scheme. 

References 

[1]  S.  Verdu,  “Adaptive  Multiuser  Detection,”  Proc.  Third  Inter¬ 
national  Symposium  on  Spread  Spectrum  Techniques  and  Ap¬ 
plications,  Oulu,  Finland,  pp.  43-50,  July  1994. 

[2]  J.  R.  Fonollosa,  J.  A.  R.  Fonollosa,  Z.  Zvonar,  and  J.  Vidal, 
“Blind  Multiuser  Identification  and  Detection  in  CDMA  Sys¬ 
tems,”  Proc.  IEEE  ICASSP-95 ,  pp.  1876-1879,  May  1995. 

[3]  N.  Seshadri,  “Joint  Data  and  Channel  Estimation  using  Blind 
Trellis  Search  Techniques,”  IEEE  Trans .  on  Communications, 
vol  42,  pp.  1000-1011,  March  1994. 

[4]  J.  A.  R.  Fonollosa  and  J.  Vidal,  “Application  of  Hidden  Markov 
Models  to  Blind  Channel  Characterization  and  Data  Detec¬ 
tion,”  Proc.  IEEE  ICASSP-94,  pp.  IV  185-188,  April  1994. 

[5]  GSM  recommendation  05.05  (version  3.11.0). 


384 


On  the  Least  Possible  Decoding  Error  Probability  for  Truly 
Asynchronous  Single  Sequence  Hopping 

Sandor  Csibi 

Dept,  of  Telecom.,.  Tech.  Univ.  of  Budapest, 

Stoczek  u.  2,  H-llll  Budapest,  Hungary 


Abstract  —  Unslotted  asynchronous  multiple  access 
without  feedback  is  considered.  Poisson  population, 
least  length  single  sequence  hopping  and  interleaved 
outer  coding  with  guard  spaces  are  assumed.  Bounds 
from  both  sides  on  the  least  decoding  error  probabil¬ 
ity  are  proved  to  vanish  with  rate  ^7  (from  a  given 
finite  source  block  length  k  on)  and  under  further 
conditions  defined  precisely  in  a  companion  preprint. 

I.  Introduction 

For  slotted  (frame)  asynchronous  least  length  single  se¬ 
quence  hopping  and  a  single  inner  R-S  code,  bounds  from  both 
sides  on  the  least  possible  decoding  error  probability  have  been 
already  obtained  by  the  same  author,  that  disappear  with  the 
source  block  length  k  at  rate  far  not  exponentially  ([l]). 
This  is  the  price  (in  error  probabity)  of  being  constrained  to 
this  simple  kind  of  multiple  access.  It  is,  obviously,  a  ques¬ 
tion  of  interest  under  what  additional  conditions  can,  for  the 
same  decoding  error  probability,  the  very  same  decay  rate  -7 
be  proved  also  for  truly  (unslotted)  asynchronous  access,  with¬ 
out  assuming  any  common  clock  for  signal  trasmission.  It  will 
be  shown  next  that  (i)  proper  kind  of  interleaving,  and  (ii) 
keeping  silence  (inserting  a  dummy  guard  space,  just  at  one 
end  of  each  message  carrying  interval  as  in  [2])  are  the  addi¬ 
tions  to  the  model,  sufficient  for  so  doing.  The  question  will 
be  answered  by  Theorem  1. 

II.  More  on  the  underlying  model 

Infinite  source  population  is  assumed,  with  demands  due  to 
a  Poisson  process  of  given  parameter  A,  called  total  demand 
rate.  One  of  the  sources  is  activated  next  to  each  demand, 
never  active  before.  Just  time  hopping  is  considered,  for  sim¬ 
plicity  and  also  because  of  the  actual  tasks  kept  in  mind  by  the 
author.  A  message  of  1 /km  symbols,  sent  next  to  each  demand, 
is  taking  values  in  GF(q ).  n  =  q  —  1.  Each  of  the  consecu¬ 
tive  m  subblocks  of  each  source  block  of  length  k  are  encoded 
by  means  of  m  distinct  (n,  h)  Reed-Solomon  component  codes 
over  GF{q)  of  the  same  kind.  Along  each  frame  superslots  are 
defined  consecutively,  each  of  m-f/t  slots.  (Superslot  duration 
is  defined  as  time  unity.  The  last  p  slots  of  each  superslot  are 
kept  dummy.)  The  same  binary  hop  sequence  s0  of  length  N, 
of  weight  n,  of  complete  cyclic  order,  and  of  cyclic  correlation 
c  =  1  is  assigned  to  each  potential  source.  Multiple  access 
erasure  channel  is  assumed  with  neither  noise  nor  delay. 
Definition  1  Consider  a  register  step  t  at  which  match  is 
declared .  There  is  frame  front  coincidence  at  t  provided  frame 
fronts  from  at  least  two  distinct  sources  occur  at  t  within  the 
correlator  window,  within  supeslot  distance  ( mod  N )  ( from  the 
rear  end  of  the  window  (mod  N  )). 

III.  Results 

Choose 

A  =  A!  n  -  k  +  1 


as  activity  threshold.  Consider  any  correlator  step  t  at  which 
match  is  declared.  Denote  by  k'  the  value  of  the  subblock 
length  k  (associated  with  each  component  code)  at  which 
Rvil m  (1  -f  ^)_1  takes  it  largest  possible  value,  given  n, 
m,  N,  and  A' .  (Obviously  k  =  km.)  Denote  by  N'  the  short¬ 
est  possible  hop  sequence  (frame)  length  at  which  decoding  is 
error  free,  at  any  f,  with  match  but  with  neither  frame  front 
coincidence  nor  overflow  (with  respect  to  activity  threshold 


A')-  . 

Denote  by  Ao  the  largest  possible  zero  error  activity  thresh¬ 
old,  given  n,  k  =  k' .  Let  N  =  Nr . 

Lemma  1 

Ao  =  fc\ 

given  any  N ,  n ,  and  k  =  k  . 


Call  peak  factor  the  ratio  (1  -f  <5)  of  Ao  to  A2 N  .  Confine 
the  study  to  0  <  6.  Denote,  at  any  instant  i,  by  Ct  the  con¬ 
figuration  of  all  frame  fronts  that  are  just  window  active  at  t; 
and  by 

¥(dec  err), 

the  decoding  error  probability  at  any  t  with  match,  but  with 
neither  frame  front  coincidence  nor  overflow  with  respect  to 
A  =  A0  —  k’ .  Refer  to 


P (dec  err)'  :=  P (dec  err) 


at  any  register  step  i  at  which  C*  equals  one  of  the  worst  pos¬ 
sible  configurations  (in  the  sense  that  the  number  of  erasures 
along  the  considered  codeword  is  the  possible  largest). 


Theorem  1  Assume  the  considered  model  for  truly  (unslot¬ 
ted)  asynchronous  single  sequence  hopping  (with  0  <  <5,  block 
length  k'  >  2,  the  number  of  frames  u  >  3  next  to  each  de¬ 
mand,  and  kr  >  10.  Then 


(1  -  gi) 


l 

4(1  +  6)(1  + 


<  P (dec  err) 


<(i+ffO(i  +  s»)e(1  +  tfj(1^TjF- 

(Expressions  of  gi(l  =  1,2,3)  are  precisely  given  in  the  com¬ 
panion  preprint.  gi  (l  —  1,2,3,)  exceed  1,  tend  to  1  as  k  — *  00 , 
and  are  close  to  1  for  usual  values  of  k' .  Recollect  that  the 
source  blocklength  of  k  q-ary  symbols  equals  mk' .) 
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Abstract  —  Direct  sequence  spread  spectrum  sys¬ 
tems  using  partial-response  signals  in  a  specular  mul¬ 
tipath  environment  are  investigated.  Instead  of  us¬ 
ing  the  conventional  precoder-decoder  combination 
for  non-spread  partial-response  signals,  a  RAKE  re¬ 
ceiver  is  employed  to  take  advantage  of  the  resolvabil¬ 
ity  provided  by  wide-band  DS/SS  signals  and  the  in¬ 
herent  diversity  of  partial-response  signals.  The  per¬ 
formance  measure  of  interest  is  signal-to-interference 
ratio  (SIR).  Our  results  suggest  partial-response  sig¬ 
nals  perform  well  in  an  outdoor  mobile  DS/SS  system 
with  high  chip  rate.  The  technique  developed  in  this 
paper  can  be  extended  to  any  type  of  partial-response 
signals. 

I.  Introduction  and  System  Model 

Partial-response  signals  have  been  widely  used  in  many  non¬ 
spread  communication  and  magnetic  recording  systems  be¬ 
cause  they  allow  transmission  at  the  Nyquist.  rate  by  intro¬ 
ducing  known  interference.  Since  the  interference  is  known, 
it  can  be  removed  by  certain  processing.  In  addition,  partial- 
response  signals  confine  all  the  signal  power  to  the  main  lobe. 
This  feature  makes  filter  design  very  straightforward  when 
out-of-band  power  emission  has  to  be  strictly  limited.  Among 
many  variations  of  partial-response  signals,  class  I  and  class  IV 
signals  are  most  widely  used  because  of  their  spectral  shapes 
and  simpler  decoding  operations  at  the  receiver. 

Two  types  of  decoding  algorithms  are  normally  used  for 
partial  response  systems:  symbol- by-symbol  decoding  and 
maximum  likelihood  sequence  detection  (PRML)  PRML  per¬ 
forms  better  than  symbol-by-symbol  decoding.  However,  the 
performance  of  PRML  depends  on  the  size  of  the  decoder 
memory  and  the  decoder  complexity  is  proportional  to  the 
size  of  the  memory.  Several  technical  difficulties  are  usually 
associated  with  conventional  partial  response  systems.  First 
the  receiver  has  to  estimate  the  power  level  of  the  received 
signal  even  when  binary  signaling  is  used.  The  inaccuracy  of 
the  power  level  estimate  of  the  received  signal  degrades  the 
noise  immunity  of  the  decoder.  The  degradation  could  be  sig¬ 
nificant  when  a  large  signal  set  is  used.  Moreover,  channel 
distortion  or  other  types  of  interference  requires  the  receiver 
to  employ  an  equalizer. 

In  this  paper,  a  direct-sequence  spread-spectrum  (DS/SS) 
system  using  class-I  partial-response  (PR-I)  and  class-IV 
partial-response  (PR-IV)  signals  is  considered.  The  self¬ 
interference  introduced  by  partial-response  signaling  is  treated 
as  a  form  of  multipath  interference  with  known  delays  and 
amplitudes,  and  a  RAKE  receiver  is  used  to  take  advantage 
of  the  known  multipath  interference.  The  main  advantage  of 
using  a  RAKE  receiver  instead  of  a  conventional  precoder- 
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decoder  combination  for  a  system  using  partial-response  sig¬ 
naling  is  the  reduction  of  the  complexity  of  the  decoder.  It 
was  mentioned  previously  that  conventional  precoder-decoder 
structure  needs  to  estimate  the  power  level  of  the  received 
signal,  to  equalize  the  channel  distortion  and  multipath  in¬ 
terference,  and  to  use  a  sequential  decoder  to  maximize  the 
performance.  However  for  a  binary  partial-response  DS/SS 
system  with  a  RAKE  receiver,  the  decoder  does  not  need  the 
information  about  the  power  level  of  the  received  signal  and 
can  sustain  the  channel  distortion  and  multipath  interference 
to  a  certain  degree  without  having  to  equalize  the  channel. 
Moreover,  symbol-by-symbol  detection  should  perform  fairly 
well  for  such  a  receiver.  The  transmitter,  channel  and  receiver 
model  were  all  detailed  in  [l]. 

II.  Numerical  Results  and  Conclusions 
We  calculated  the  SIR’s  for  a  DS/SS  system  using  PR-I,  PR- 
IV,  and  filtered  rectangular  chip  waveforms.  Here,  for  fair 
comparisons,  the  filtered  rectangular  chip  is  a  unit  amplitude 
pulse  low-pass  filtered  by  an  ideal  brickwall  filter  with  cut¬ 
off  frequency  equal  to  one  half  of  the  chip  rate  to  produce  a 
DS/SS  signal  of  the  same  bandwidth  as  its  partial-response 
counterparts.  Our  results  show  that  when  random  spreading 
sequences  are  used  filtered  rectangular  chips  outperform  PR-I 
and  PR-IV  by  about  0.5-1  dB.  On  the  other  hand,  with  m- 
sequences  and  differential  delays  longer  than  3  chips  duration 
but  no  longer  than  N  —  3  chips  duration,  where  N  is  the  num¬ 
ber  of  chips  per  bit,  partial-response  signals  perform  better 
than  filtered  rectangular  in  certain  cases.  This  suggests  that 
partial-response  signals  may  be  attractive  in  an  outdoor  mo¬ 
bile  radio  environment  with  high  chip  rate.  Another  feature 
of  partial-response  signals  is  they  can  be  designed  to  match  to 
the  frequency  response  of  the  channel  or  the  frequency  band 
allocations.  It  may  be  possible  to  optimize  the  system  to  min¬ 
imize  the  self-interference. 

The  RAKE  receiver  structure  presented  in  this  paper 
also  makes  the  decoding  process  for  other  types  of  partial- 
response  signaling  straightforward,  whereas  for  the  conven¬ 
tional  precoder-decoder  structure  the  decoding  process  be¬ 
comes  cumbersome  when  the  number  of  controlled  interference 
terms  of  partial-response  signaling  is  greater  than  two. 

A  potential  disadvantage  of  partial-response  signals  is  that 
they  do  not  have  uniform  amplitude,  and  hence  may  suf¬ 
fer  from  non-linear  amplification.  Therefore,  the  transmitter 
power  amplifier  must  operate  in  the  linear  range. 
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Abstract  -  Practical  frequency  hopped  spread  spectrum  (FHSS) 
wireless  networks  and  multitone  modulated  wireline  systems  can  be 
modeled  as  sets  of  interference  channels.  In  these  systems,  it  is 
desirable  to  optimize  a  cost  function  over  the  network  that  includes 
the  transmission  rates,  blocking  probability,  and  dropping  probabil¬ 
ity  for  users.  This  optimization  can  be  approximated  using  distrib¬ 
uted  algorithms  that  do  not  require  explicit  communication  between 
pairs  of  users.  We  present  one  such  algorithm  that  is  designed  to 
quickly  identify  a  suboptimal,  but  reasonable  solution.  The  perfor¬ 
mance  of  this  algorithm  is  evaluated  with  simulations  of  prototype 
wireline  and  wireless  systems. 

I.  Introduction 

Both  frequency  hopped  wireless  networks  [3]  and  multitone 
modulated  wireline  networks  [1,2]  can  be  modeled  by  a  gain  matrix 
plus  additive  white  gaussian  noise.  In  the  wireline  case,  user  pairs 
sharing  a  twisted  pair  cable  interfere  with  each  other  through  near 
end  cross  talk  (NEXT)  and  far  end  cross  talk  (FEXT).  Likewise,  in  a 
cell  based  wireless  system,  communication  pairs  formed  between 
base  stations  and  users  interfere  because  of  the  shared  radio  chan¬ 
nel.  We  assume  here  that  the  base  station  receiver  decodes  the 
received  signals  from  different  users  independently,  as  is  done  in 
practice.  For  both  systems,  the  resulting  channel  model  is  a  set  of 
interference  channels. 

Current  digital  wireless  systems  (IS-54  TDMA)  and  wireline  sys¬ 
tems  (discrete  multitone  ADSL)  use  fixed  reuse  patterns  to  guaran¬ 
tee  a  minimal  signal  to  interference  ratio  (SIR)  and  a  minimal  level 
of  service.  These  reuse  patterns  take  the  form  of  cellular  frequency 
planning  in  wireless  systems,  and  fixed  transmitter  power  levels  in 
wireline  systems.  Since  reuse  patterns  are  designed  for  the  worst 
case  (cell  boundaries  for  wireless  and  1%  worst  case  interferers  for 
wireline)  they  are  inherently  inefficient.  Capacity  improvements  can 
be  made  in  both  systems  by  adapting  to  the  actual  interference  and 
avoiding  the  worst  case  situation  when  two  users  with  high  interfer¬ 
ence  levels  share  the  same  channel. 

Recent  work  has  shown  that  high  capacity  can  be  achieved  in 
wireless  systems  with  frequency  hopping  over  orthogonal  hopping 
patterns  [4].  With  orthogonal  hopping  patterns,  a  user  sees  a  differ¬ 
ent  set  of  independent  interferers  in  each  hop.  Power  and  bits  can  be 
allocated  to  those  hops  where  interference  is  relatively  low  and  a 
high  SIR  can  be  maintained  for  all  users  on  the  channel.  A  similar 
situation  exists  for  the  multitone  wireline  system,  except  the  K 
channels  are  accessed  in  parallel.  Certain  pairs  of  users  will  inter¬ 
fere  strongly.  We  choose  to  allocate  power  among  users  and  chan¬ 
nels  to  avoid  this  situation. 

II.  cost  Function  Optimization 

Assuming  the  interfering  signals  are  independent  and  Gaussian, 
the  aggregate  bit  rate  over  K  channels  is  the  average  of  the  achiev¬ 
able  bits  rates  over  the  channel  set.  For  the  wireline  case,  the  trans¬ 
mission  rate  is  increased  by  a  factor  of  K  because  the  channels  are 
accessed  in  parallel.  The  relevant  measure  of  the  system  perfor¬ 
mance  is  a  cost  function  with  call  blocking,  dropping,  and  system 


capacity  as  arguments.  The  goal  is  to  optimize  the  cost  over  all  users 
in  the  network  so  that  the  maximum  revenue  for  network  operation 
can  be  maintained,  subject  to  specific  service  constraints. 

Since  the  SIRs  of  the  users  in  the  network  are  all  interconnected 
by  the  transmit  powers,  the  optimal  choice  of  the  feasible  set  is  diffi¬ 
cult.  Any  algorithm  for  optimizing  the  cost  function  must  operate 
jointly  over  all  the  users.  Our  approach  is  to  choose  a  suboptimal 
solution  based  on  admission  control.  New  users  rapidly  probe  all  K 
channels  to  determine  which  can  be  used  without  excessive  interfer¬ 
ence  to  previously  active  users.  The  algorithm  is  modeled  after  [5] 
but  extended  to  handle  multiple  constellation  sizes  and  periodic 
adaptation  of  active  users.  Active  users  make  an  effort  to  accommo¬ 
date  new  users,  but  only  if  doing  so  will  allow  them  to  maintain  the 
transmission  rate  they  achieved  when  entering  the  network.  The  bal¬ 
ance  between  blocking  and  dropping  probabilities  is  controlled  by 
the  aggressiveness  of  new  users  and  the  ability  of  active  users  to 
block  new  users  when  necessary.  After  admission,  an  active  user 
will  use  a  distributed  power  control  algorithm  [6]  to  maintain  the 
SIRs  on  all  allocated  channels. 

III.  Conclusions 

At  a  fundamental  level,  multitone  digital  subscriber  lines  and  fre¬ 
quency  hopped  wireless  networks  have  formally  similar  channels 
and  network  cost  functions.  Optimizing  the  capacity  and  utility  of 
these  systems  can  be  achieved  by  algorithms  that  operate  in  a  dis¬ 
tributed  fashion,  using  only  the  knowledge  of  one’s  own  channel 
characteristics  and  the  interference  from  other  users.  The  differ¬ 
ences  between  wireline  and  wireless  systems  lie  in  the  magnitude  of 
the  interference  between  user  pairs  and  the  associated  costs  of 
blocking  and  dropping.  When  these  factors  are  incorporated  into  the 
adaptation  algorithms,  good  performance  can  be  achieved  with  both 
systems. 
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I.  Introduction 

The  optimal  universal  code  for  FSMX  sources[l|  with  re¬ 
spect  to  Bayes  redundancy  criterion^]  is  deduced  under  the 
condition  that  the  model,  the  probabilistic  parameters  and  the 
initial  state  are  unknown.  The  algorithm  is  not  only  Bayes 
optimal  for  FSMX  sources  but  also  asymptotically  optimal 
for  a  stationary  ergodic  sources.  Moreover  the  algorithm  is 
regarded  as  a  generalization  of  the  Ziv-Lernpel  algorithm.  In 
the  basic  GTW  algorithm,  the  algorithm  needs  the  initial  con¬ 
text  x\-dX2-d  •  •  •  xo,  where  a  finite  constant  d  is  the  depth 
of  the  context  tree,  for  calculating  the  coding  probability  of 
xx.  For  the  problems  of  the  initial  situation  and  the  infinite 
depth  tree,  the  extensions  to  the  CTW  algorithm  have  been 
proposed  in  [3|.  The  optimal  algorithm  proposed  in  this  pa¬ 
per  gives  a  solution  against  these  problems  from  another  new 
point  of  view. 


Pl  (x^x1  l,s)  and  Ps(xt \xl  x,s)  are  defined  as  follows: 

Pl(xt\xl-\s)  J  ...  J  P(xt\xt-\0,(s)Js) 

P(O'(s)\xt-\s)d0I(s ),  (2) 

Ps(xt\xl-\s)-  J ...  J  P(xt\x l-\0(s),s) 

P(0(s)\xl-\ s)dO(s),  (3) 

where  P(0J (s)\xt~~1  ^  s)  is  the  posterior  probability  of  91  ($) 
given  (xL~\s). 

Theorem  1  Let  q(s|x£_l)  be  th,e.  posterior  probability  of  q(s) 
given  xl~l .  The  adaptive  coding  probability  of  Bayes  code  vrith 
respect  to  Formula  (1)  is  given  by  the  following  recursion  for¬ 
mula: 


II.  The  probability  of  a  sequence  from  a  FSMX 

SOURCE 

If  arithmetic  coding  is  used  for  universal  coding,  the  main 
problem  is  deciding  coding  probability  P c(x7L)  or  Pc{xt\xt  x) 
which  is  the  probability  assumed  to  code  a  source  sequence 
xn  :  xxx2  '  "Xn  where  x,  £  A.  Let  m  be  an  FSMX  source 
model.  The  state  set  of  m  is  represented  by  a  /-ary  complete 
tree  T(m)  called  a  context  tree.  Let  S(m)  be  the  set  of  all 
states  in  rn.  S(m)  corresponds  to  the  set  of  all  leaf  nodes 
in  T(m).  The  state  of  a  model  m  at  t  Ls  determined  by  the 
postfix  of  a  source  sequence  xl.  This  mapping  from  xl  to  a 
state  s  £  S(m)  is  denoted  by  frn(xt).  The  node  corresponding 
to  a  postfix  x^j  is  denoted  by  s(xtt_J).  All  interior  nodes  of 
a  tree  T(m)  is  denoted  by  Sl  (m). 

For  efficiency  of  the  calculation  of  Bayes  coding,  we  intro¬ 
duce  a  parametric  representation  for  the  probability  of  FSMX 
sources.  Let  6(m)  be  a  transition  probability  (P(:r|s)|x  £ 
A,s  £  S(m)}.  Moreover,  the  initial  transition  probability 
01  (m)  {P/(x|s)|a:  £  A,  s  £  Sl  (m)}  is  introduced.  The 

probability  of  a  sequence  xL  Ls  represented  by 

P(xt\O(m),0I (m),rn)  =  P1  (x i|A)  •  •  •  Pl (xj\s(x{~1)) 

t-  1 

11  ^>(:Ei  +  1  \fm(xd))1  (1) 

i~J 

where  J  -  argminJ{s(xf})|5(xf))  £  5(m)}. 

Ill.  A  RECURSIVE  CALCULATION  OF  THE  CODING 
PROBABILITY 

The  Bayes  optimal  redundancy  code  for  hierarchy  source 
models  such  as  FSMX  models  given  an  initial  state  was  pre¬ 
sented  in  our  previous  paper.  In  the  case  that  the  initial  con¬ 
dition  is  unknown,  the  Bayes  code  of  the  FSMX  models  repre¬ 
sented  by  Formula  (1)  is  given  in  this  section.  The  recursion 
formulas  of  the  adaptive  coding  probability  of  the  code  are  in¬ 
duced  by  using  special  classes  of  the  prior  :  <?(<*),  P(0(*)U)|4| 
and 
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Pc{xt\xl  l)=<?(x£|x‘  \sx), 
q(xt\xl-\s)  -d  (*1}  lfS  =  s(xU) 


(4) 

^  (*2)  otherwise ,  ^ 

q(s\xt-l)Ps(xt\xt~\s)  +  (\-q(s\xt~'l))PI(xt\xt~1,s), 
q(s\xt~l)PS(xt\xt~\s)  +  (1  - 


(*1) 

(*2) 

where  s'  is  a  child  node  of  s ,  and  s',  s  £  {s(xjl1)|j  sr  1, 

I}- 


IV.  The  proposed  algorithm 

Using  Theorem  1,  we  propose  a  practical  Bayes  coding  al¬ 
gorithms  for  FSMX  sources.  The  context  tree  used  in  the 
algorithm,  which  is  not  always  an  Z-aiy  complete  tree,  grows 
according  as  the  length  of  the  source  sequence  increases.  The 
set  of  the  paths  from  the  root  to  the  leaves  in  the  context  tree 
with  respect  to  the  sequence  xl  contains  all  parsing  blocks  of 
xl  by  the  Z-L  algorithm.  This  means  that  the  FSMX  sources 
implicitly  assumed  in  the  Z-L  algorithm  are  included  in  the 
context  trees  in  the  proposed  algorithm.  Although  the  Z-L 
algorithm  assumes  a  single  FSMX  source  for  parsing,  our  al¬ 
gorithm  uses  a  mixture  model  with  respect  to  the  set  of  FSMX 
sources  which  includes  the  single  FSMX  source.  The  proposed 
algorithm  is  regarded  as  a  generalization  of  the  Z-L  algorithm. 

Referenges 

[1]  J.  Rissaneu.  Universal  modeling  and  coding.  IEEE  Trans.  Inf. 
Theory ,  27(l):12-23,  Jan  1981. 

[2]  L.  D.  Davison.  Universal  noiseless  coding.  IEEE  Trans.  Inf. 
Theory ,  19(6):783-795,  Nov  1973. 

[3]  F.  M.  J.  Willems.  Extensions  to  the  context  tree  weighting 
method.  In  Proc.  Int.  Symp.  of  Infromation  Theory ,  page  387, 
1994. 

[4]  T.  Matsushima,  ,  and  S.  Hirasawa.  A  bayes  coding  algorithm 
using  context  tree.  In  Proc.  Int.  Symp.  of  Infromation  Theory , 
page  386,  1994. 


388 


A  CTW  Scheme  for  Some  FSM  Models 

Joe  Suzuki 

Dept  of  Mathematics,  Osaka  University, 

Toyonaka,  Osaka  560,  Japan 


Abstract  —  The  presented  paper  addresses  a  mod¬ 
ified  version  of*  the  CTW  (Context  Tree  Weighting) 
which  deals  with  some  FSM  (Finite  State  Machine) 
models  as  well  as  the  FSMX  (FSM  X)  models  at  little 
expense  of  computing  in  encoding/decoding. 

The  FSMX  model  is  an  FSM  model  g  £  GD  (D  >  0:  in¬ 
teger)  in  which  each  state  s  the  data  x t+i ,  t  =  0, 1,  •  •  * ,  n  — *  1 
(n  >  1:  integer),  to  be  encoded  depends  on  is  expressed  as 
the  shortest  sequence  xt_d+i^t-d+2  •  •  ■  xt  (d  <  D)  such  that 
no  state  s  £  is  a  postfix  of  any  other  state,  where  Gd  is 
the  set  of  the  models  whose  depth  d  is  at  most  D ,  and  S(g) 
is  the  set  of  the  states  for  g  €  GD.  In  general,  the  length 
Kx*\x-oo)  given  x°-oo  €  x°°  is  expressed  by  /(x”|z-oo)  = 

-1°s{E9€gd  )}  -  *?  €  *"•  where 

W(g),  g  e  Gd,  satisfys  Eg6GD  W^)  <  1  (model  weighting 
technique).  Then,  for  each  model  g  £  Gd ,  the  probability 

Q«(*rk°-oo)  =  n  rat„tt[+3]’+  iyr is  assigned  to  each  state 

s  £  S(g ),  where  the  product  is  taken  over  t  —  0, 1,  •  *  • ,  n  —  1 
such  that  the  state  at  time  instance  t  +  1  is  5  £  S(g),  and 
nt[xt+i,s]  and  ut[s]  are  respectively  the  occurrence  of  xt+i  £ 
X  given  s  £  S(g)  and  that  of  s  £  S(g)  in  t  =  0, 1,  •  *  *  n  —  1. 

The  CTW  gives  length  l(x?\xto o)  =  -  log  Px(x?\x°_D+i), 
xj  £  X”,  by  setting  x°_D+1  £  XD  and  constants  0  <  0,  <  1 
for  s  £  U^Xd  ( f3s  =  0  for  s  £  XD ),  and  applying  the 
following  equation  recursively: 


P‘{x”  |*%+1) 


+P.  FI  Pxs(tf\x°-D+ 1)  (0  <  \s\  <  D ) 

x£X 

(M  =  D) 

(1) 


where  xs  £  Ui <<*<£> Xd  is  the  concatenation  of  x  £  X  and  5  £ 
Uo<d<D-iXd,  and  D  >  0  is  some  constant.  Then,  W(g),  g  £ 
Gd  ,  are  expressed  as  W(g)  =  rLes(9)(1_^5)  TlteT{g)-s(g) 
where  T(g)  is  the  set  of  any  postfixes  of  s  £  S(g)  including 
s  itself.  For  example,  W(g),  g  £  Gd,  are  obtained  for  five 
models  ( D  =  2)  and  /3S  =  1/2,  s  £  {A}  U  X,  as  depicted  m 
Figure  1.  Notice  that  just  O(Dn)  computation  and  0(aD) 
storage  are  needed  for  the  encoding/decoding  although  the 
depth  is  bounded  by  the  finite  constant  D  [1]. 

In  this  paper,  we  remove  the  constraint  in  the  CTW  that 
the  source  should  be  an  FSMX  model  [1].  Although  the  pro¬ 
posed  scheme  does  not  yet  cover  the  general  FSM  models  with 
bounded  depth  D ,  the  upperbound  of  the  individual  redun¬ 
dancy  coincides  with  that  of  the  original  scheme  except  the 
length  of  model  g  £  Gd  (Theorem  1).  In  addition,  the  com¬ 
putation  complexity  is  shown  to  be  0(2^71)  (Theorem  2).  Al¬ 
though  the  0(2^71)  computation  may  seem  to  be  enormous 
compared  with  that  of  the  original  scheme,  0(|Gd|«)  com¬ 
putation  is  required  to  realize  model  weighting  technique  for 
general  FSM  models.  The  number  of  possible  models  which 
we  deal  with  in  this  paper  is  proved  to  be  |Gd|  =  0(2a  ) 
(Theorem  3). 


The  model  class  we  deal  with  is  such  that  each  state  which 
Xt+1)  t  =  0, 1,  ■  • ,  N- 1,  depends  on  is  expressed  as  an  element 
in  (Xu{*})D  rather  than  that  in  U0<d<DXdi  where  refers 
to  a  don’t  care  symbol  meaning  that  the  state  does  not  depend 
on  the  value  of  the  position.  Then,  Eq.  (1)  is  replaced  by  the 
following  recursive  equation 


p-mv-o.,)"  n <osw<j>) 

xex 

{  Qs(x? \x°_D+1)  (M  =  D) 

(2) 

Then,  VT(3),  (/€Gd,  are  expressed  as  W(g)  =  ILe^sT  " 
fjr)  l\teR{r])  fit,  where  R(g)  is  the  set  of  r  €  U0<d<p-i(X  U 
{*})d  such  that  concatination  *r  is  a  postfix  of  any  state  in 
model  g  £  Gd-  For  example,  W (g ),  g  £  Gd,  are  obtained  for 
six  models  (D  =  2)  and  =  1/2,  5  £  {A}  U  X,  depicted  as 
in  Figure  2.  Note  that  any  FSMX  model  can  be  expressed  as 
a  specific  case  where,  once  is  emitted  by  some  node,  the 
does  not  stop  until  the  leaf,  thus  such  a  model  as  Figure 
2  (e)  is  excluded  in  the  original  scheme. 

The  procedure  of  the  update  at  time  instance  t  +  1,  t  = 
0, 1,  •  •  ■  ,  n  —  1,  is  summaized  as  follows:  Replace  some  place 
of  the  o-nary  sequence  x|_p+1  £  XD  with  “*”’s  to  ^obtain 
2d  (a  +  l)-nary  sequence  of  length  D\  update  Qs(xi  |x^£,+1), 
»t[xt+i,s],  ®t+ 1  €  X,  and  nt[s]  for  the  2D  states  5  £  (X  U 
{*})D  generated  in  step  1;  and  generate  JP5(x?|xijD+1)  by 
recursively  applying  Eq.  (2)  to  the  updated  P  (xi|x_d+i)> 
x  £  X  U  {*},  until  Px(x?\x°_D+1)  is  obtained. 
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Abstract  —  We  investigate  the  effect  of  time  rever¬ 
sal  on  tree  models  of  finite-memory  processes.  This  is 
motivated  in  part  by  the  following  simple  question 
that  arises  in  some  data  compression  applications: 
when  trying  to  compress  a  data  string  using  a  univer¬ 
sal  source  modeler,  can  it  make  a  difference  whether 
we  read  the  string  from  left  to  right  or  from  right  to 
left?  We  characterize  the  class  of  finite-memory  two- 
sided  tree  processes,  whose  time-reversed  versions  also 
admit  tree  models.  Given  a  tree  model,  we  present 
a  construction  of  the  tree  model  corresponding  to 
the  reverse  process,  and  we  show  that  the  number 
of  states  in  the  reverse  tree  might  be,  in  the  extreme 
case,  quadratic  in  the  number  of  states  of  the  original 
tree.  This  answers  the  above  motivating  question  in 
the  affirmative. 

I.  Summary 

Tree  models [2]  provide  a  reduced  parametrization  of  finite- 
memory  (Markov)  sources,  which  can  be  efficiently  and  opti¬ 
mally  modeled  using  Algorithm  Context  [1,  2],  thus  allowing  a 
model  size  that  is  not  necessarily  exponential  in  the  Markov 
order.  In  this  work,  we  investigate  the  effects  of  time  reversal 
on  the  structure  of  the  minimal  tree  model  of  a  finite-memory 
source.  Time  reversal  of  stationary  Markov  processes  is  well 
understood  in  the  literature.  In  particular,  it  is  known  that 
time  reversal  preserves  both  the  order  and  the  entropy  of  a  sta¬ 
tionary  Markov  process  (see,  e.g.,  [3,  Ch.  4]).  This  still  leaves 
the  question  of  the  effect  on  the  minimal  tree  parametriza- 
tion  open,  and  our  interest  in  it  stems  from  its  implications 
(through  model  cost )  on  the  rate  of  convergence  to  the  entropy 
of  a  universal  modeler. 

Let  A  be  an  alphabet  of  a  symbols,  and  let  A  de¬ 
note  the  empty  string.  For  a  string  u=uiu2  . . .  uk€A* ,  let 
u=UkUk— i . . .  tii  denote  the  reverse  of  u .  A  process  (or  in¬ 
formation  source )  over  A  is  defined  as  a  probability  assign¬ 
ment  P  :  A*  ->  [0,1]  satisfying  P(A)  =  1  and  P(u)  = 
XXeA  P(ua)  Vii  G  A*.  Consider  an  arbitrary  sequence 
x  =  x\x2  *  ■  •  xn  over  A.  A  process  P  has  the  finite-memory 
property  { see,  e.g.  [2])  if  the  function  p(a| x n)  =  P(xna)/P(xn ) 
(a  conditional  probability  by  the  properties  of  P)  satisfies 

p('\xn)  =  p(-|us(xn))  Vu  6  A*,  (1) 

where  s(xn)  =  xnxn-i  •  ■  •  xn-e+i  for  some  i,  0  <  £  <  m, 
not  necessarily  the  same  for  all  xn  (the  case  t  =  0  is  inter¬ 
preted  as  defining  the  empty  string).  Such  a  string  s(zn) 
is  called  a  state.  In  a  minimal  representation  of  the  model, 
s(xn)  is  the  shortest  suffix  of  xn  (or  context)  satisfying  (1). 
The  set  S  of  states  defines  a  complete  a-ary  tree  T,  with  the 
branches  labeled  by  symbols  of  the  alphabet,  and  S  as  the 
set  of  leaves.  The  pair  T  =  <T,p(- )•)>  is  called  a  tree  model 
for  the  process  P,  which  is  called  minimal  if  for  every  node 
w  in  T  such  that  all  its  successors  wb  are  leaves,  there  exists 
a,b,c  €  A  such  that  p(a\wb)  ^  p{a\wc).  Conversely,  we  prove 
that  given  a  tree  model  T  =  <T,p(- |-)>,  there  exists  one  and 


only  one  two-sided  tree  process  P  modeled  by  T,  such  that 
the  reverse  assignment  P(u)  =  P(u)  is  also  a  finite  memory 
process  (called  also  the  reverse  process  of  P). 

The  reverse  process  P  admits  a  minimal  tree  model 
<Tp,p(*  1*)^-  The  underlying  tree  Tp  in  this  model  depends  on 
both  T  and  p(-|-)Mn  contrast,  we  define  the  reverse  tree  of  T 
as  T  =  Uall  P(. |.)  Tp  y  which  depends  solely  on  T.  The  tree  T  is 
the  minimal  representation  for  the  reverses  of  all  the  processes 
whose  minimal  tree  models  have  T  as  underlying  graph.  We 
can  also  see  T  as  the  minimal  tree  of  a  reverse  process  Pz , 
where  Pz  is  a  “symbolic  process”  with  a  minimal  tree  model 
<T,pz  (•]■)>  in  which  we  have  substituted  (a—  1)  symbolic  in- 
determinates  za,s  for  the  free  parameters  p(a|s)  at  each  state 
s.  Noticejhat,  while  there  is  a  symmetry  between  T  and  Tp, 

so  that  ( Tp)p  =  T,  no  such  symmetry  exists  between  T  and 

T ,  and  we  might  have  T  /  T.  We  present  a  combinatorial 
construction  of  T,  andmse  it  as  a  tool  to  bound  the  size  differ¬ 
ence  between  T  and  Tp,  noting  that  the  latter  is  a  subtree  of 
T .  The  construction  and  proofs  rely  on  the  characterization 
of  tree  models  that  have  the  finite-state  machine  poperty ,  i.e., 
whose  leaves  uniquely  define  a  next- state  function.  Let  \T\L 
denote  the  number  of  leaves  of  a  complete  a-ary  tree  T. 
Theorem  1.  (a)  Let  T  be  such  that  \T\L=N.  Then , 

lTu  s  W=T)N’  +  om 

(b)  For  every  N> 0  such  that  N  =  1  mod(a— 1),  there  exists  a 
complete  a-ary  tree  T  with  \T\L=N ,  such  that  \T\L  attains 
the  upper  bound  of  part  (a)  up  to  an  additive  term  O(N). 

Corollary  1.  Let  P  be  a  process  with  minimal  tree  model 
<T,p(*|-)>,  a ndjet  N=\T\l-  Then,  the  minimal  tree  model 
<Tp,p(- |-)>  of  P  satisfies 

V2(a-l)iV-0(l)  <  \Tv\l  <  o7-  1  ttN2  +  0(N)  . 

2(a  —  1) 

It  follows  from  Corollary  1  that,  when  using  tree  sources  to 
model  data,  there  might  be  significant  differences  between  the 
size  of  the  tree  estimated  when  reading  the  data  from  left  to 
right  and  the  one  estimated  from  right  to  left.  These  differ¬ 
ences,  in  turn,  affect  the  model  cost  incurred  by  the  modeling 
algorithm.  This  behavior  is  a  consequence  of  the  choice  of 
class  of  models  targeted  by  the  algorithm,  since  the  number 
of  free  parameters  determining  the  reverse  process  is  identical 
to  the  number  of  parameters  in  the  original  process.  On  the 
other  hand,  it  is  this  choice  of  model  class  that  allows  for  an 
efficient  estimation  algorithm. 
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In  this  abstract,  we  give  an  approximation  formula  for  the 
predictive  Bayes  code  for  the  FSMX’  models  (subspaces  of 
Markov  models).  Moreover,  we  empirically  show  that  the 
code  using  our  approximation  formula  with  the  Jeffreys  prior 
employed  gives  shorter  code  length  than  the  one  using  the 
Laplace  estimator  for  the  first  order  Markov  models. 

Let  A  be  a  set  of  m  symbols.  Suppose  that  A  contains 
symbol  to  and  let  A'  denote  A  —  {cj}.  Let  T  be  a  subset  of  A* . 
When,  for  all  s  €  T,  any  postfix  of  s  belongs  to  T  (e.g.,  the 
postfixes  of  ai<22  are  aia2,  a  2  and  A  (the  null  sequence)),  T  is 
called  a  context  tree.  Define  dT  =  {as|a  €  A,  s  E  T}U{A}— T . 
Each  element  of  dT  is  called  a  leaf  of  T  or  a  context.  For 


any  a 1  =  aia2,...,Ut  (t  >  d(T)  =  max5gar  |s|),  let  s(al)  de¬ 
note  a  postfix  of  a 1  which  belongs  to  dT.  5  (a1)  is  called 
the  context  of  a 1  defined  by  T.  Now,  we  define  the  FSMX’ 
source p(-\r),  T)  as  p(aN |jj,  T)  =  p(ad{T) \p, T)  II,^=d(T)+i  %‘A > 
where  rj*  denotes  in  general  the  probability  that  a  is  pro¬ 
duced  at  the  context  s  (i.e.  77J  >  0  and  YlaeA  ^  =  1  hold.) 
and  p(ad^\r],  T)  denotes  the  initial  probability  determined 
by  the  stationary  probabilities.  Let  77  be  the  ( \A\  —  1)  •  \dT\- 
dimensional  vector  whose  components  are  77J  (5  6  dT ,  a  €  A!) 
and  H(T)  denote  the  range  of  77.  We  can  write  p(aN\rf,T)  = 

p(a<£|77,T)nsSaTria€^(7?“)n’’  where  denotes  the  num’ 

ber  of  times  a  is  generated  at  the  context  s  in  the  sequence 
a^+i.-.a/v,  and  we  let  na  =  YlaeA  FSMX’  model  M(T) 

is  defined  as  M(T)  =  {p(*|?7,  T)|t7  €  H(T)}.  (When  Vs  €  dT 
Va  €  A  sa  g  T  holds,  M(T)  is  called  an  FSMX  model.)  By 
introducing  the  another  parameter  9,  p(aN  1 77)  can  be  rewritten 
as  follows. 

p(aN\r),T)  =  p(ad\r),T)  JJ  exp(n»(  V  (1) 

s£dT  a  £A' 

where  we  let  6“  =  ^(77^/77"),  77®  =  n^/ns,  and  ^(9S)  = 
—  In 77"  =  ln(l  +  YlaeA’  exP^?)-  (‘In’  denotes  the  natural  log¬ 
arithm.)  We  let  @(T)  denote  the  range  of  9  as  77  varies  over 
H(T).  Note  that  a  class  of  probability  distributions  written  as 
exp(nsC^2aeA,  (O^)rjs  -ip(9s)))  is  called  an  exponential  family. 
We  use  p(*|0,  T)  as  a  short  hand  notation  for  p('\r}(9),  T). 

Next,  we  define  the  Bayes  code  for  M(T).  We  fix  a  con¬ 
text  tree  T  and  let  p{-\0)  denote  p(*|0,  T).  We  assume  a  prior 
w(6)d6  over  0(T).  Then,  the  predictive  Bayes  code  with  prior 
w  is  given  by  pw(dN+i  [a^)  =  f  p(ajv+i  |aN,  9)w(rj\aN)d9  = 
f  \^N\w(v\aN)^i  where  w(0\aN)  denotes  the  posterior  den¬ 
sity  of  9.  Now,  we  can  state  our  main  result. 


Theorem  1  Let  w(9)  be  the  prior  defined  on  the  measure  dO. 
Under  a  certain  weak  condition ,  for  every  a  E  A' , 

1  d  \n(p(ad\9)w(9)) 


pw(a\aN)  =  r)asc  + 


■o( 


\finN 


nsc  d9asc  '  ~Knscy/N 

holds,  where  sc  and  9  denote  s(aN)  and  9(fj),  respectively . 


(2) 


Remark:  The  key  of  the  proof  is  expression  (1).  This  extends 
the  approximation  formula  for  the  Bayes  code  for  any  (i.i.d.) 
exponential  family  given  in  [3]. 

We  let  77  denote  the  first  term  plus  the  second  term  of 

(2).  Then  we  can  use  77“*  J*  as  an  approximation  formula  for 


^(on+iI^)-  Let  wj  denote  Jeffreys  prior,  which  is  defined  as 
WJ(9)d9  =  (detJ(9))l/2/c  (c  is  a  normalization  constant  and 
J{9)  is  the  Fisher  information  matrix  with  respect  to  9).  We 
refer  to  pWJ  as  the  Jeffreys  code.  It  is  known  that  pWj  for 
the  i.i.d.  case  (i.e.  T  =  0)  is  asymptotically  minimax  in  terms 
of  redundancy  ([1])  and  almost  equals  the  Laplace  estimator, 
which  is  used  in  C0NTEXT[2]  and  CTW  method[4]. 

Now,  we  compare  our  approximation  formula  for  wj  with 
the  Laplace  estimator.  Let  A  =  {0, 1}  (c 0  —  0)  and  T  =  {A}. 
(dT  =  {0,1})  Suppose  that  s(a^)  =  0.  By  Theorem  1,  the 
approximation  of  the  Jeffreys  code  for  this  case  is  given  by 

-i  .1  ,  1  /.  ,  ,  2^(1  -  fil)s  ,Q, 

Vo=Vo  +  -{l+a,-1.5Vo-  ^+.o  )•  W 

Note  that  the  difference  between  770  and  the  Laplace  estimator 
(7^0  +  0.5)/(no  +  1)  equals  Q(l/n0)- 

We  have  compared  the  redundancy  of  the  code  using  (3) 
with  that  of  the  one  using  the  Laplace  estimator  (let  pt  de¬ 
note  it)  by  a  computer  simulation.  In  general,  the  redun¬ 
dancy  of  code  q  is  defined  as  Rw(9,q)  =  E$(—\ogq(aN)  — 
(—  log^a^j#))),  where  q(aN)  is  the  block  probability  given 
to  aN  by  q  and  E$  denotes  the  expectation  with  respect  to 
p('\9).  (‘log’  dnotes  the  logarithm  to  the  base  2.)  We  have 
estimated  the  expectation  with  respect  to  p(-\9)  by  perform¬ 
ing  a  large  number  of  trials  using  pseudo  random  numbers. 
We  show  the  result  with  N  —  50  in  Table  1.  The  number  of 
trials  is  1000.  In  each  cell,  the  right  hand  sides  and  left  hand 
sides  denote  Rn(9,pwj)  and  Rn(9,Pl),  respectively.  The  ver¬ 
tical  and  horizontal  axis  correspond  to  the  values  of  770  and  77? 
of  the  actual  source  respectively.  We  can  see  that  R^(9,pj) 


- O - 

- XT. 5 - 

- £T3 

0.1  pj  /  PL 

3.59  /  3.72 

3.52  /3.72 

3.71  /  4.01 

0.5 

3.44  /3.83 

3.36  /  3.78 

0.9 

3.41  /  3.86 

Table  1:  Redundancy 

<  Rn(9,pl)  holds  for  all  cases.  This  seems  to  support  our 
conjecture  ([5])  that  the  Jeffreys  code  is  minimax  for  FSMX’ 
models  as  well. 

Our  approximation  formula  requires  not  only  the  tz“c’s  but 
also  the  n*'s  for  all  s  E  dT.  On  the  other  hand,  the  Laplace 
estimator  can  be  calculated  based  on  the  n^’s  alone.  Both 
CONTEXT  and  CTW  methods  make  use  of  such  property  of 
the  Laplace  estimator.  Hence,  there  is  a  difficulty  in  introduc¬ 
ing  our  formula  to  CONTEXT  or  CTW. 
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I.  Introduction 

Markov  chain  (N-gram)  source  models  for  natural  language 
were  explored  by  Shannon  and  have  found  wide  application 
in  speech  recognition  systems.  However,  the  underlying  lin¬ 
ear  graph  structure  is  inadequate  to  express  the  hierarchical 
structure  of  language  necessary  for  encoding  syntactic  infor¬ 
mation.  Context-free  language  models  which  generate  tree 
graphs  are  a  natural  way  of  encoding  this  information,  but 
lack  the  modeling  of  interword  dependencies. 

In  this  paper,  we  consider  a  hybrid  tree/chain  graph  struc¬ 
ture  which  has  the  advantage  of  incorporating  lexical  depen¬ 
dencies  in  syntactic  representations.  Two  Markov  random 
held  probability  measures  are  derived  on  these  tree/chain 
graphs  from  the  maximum  entropy  principle. 

II.  Stochastic  Context-Free  Grammars 

A  stochastic  context-free  grammar  G  is  specified  by  the  quin¬ 
tuple  <  Vn,Vt,  R,  S,  P  >  where  Vn  is  a  finite  set  of  non¬ 
terminal  symbols,  Vr  is  a  finite  set  of  terminal  symbols,  R  is 
a  set  of  rewrite  rules,  5  is  a  start  symbol  in  Vn,  and  P  is  a 
parameter  vector.  If  r  £  R,  then  Pr  is  the  probability  of  using 
the  rewrite  rule  r. 

An  important  measure  is  the  probability  of  a  deriva¬ 
tion  tree  T.  Using  ideas  from  the  random  branching  pro¬ 
cess  literature  [l,  4],  we  specify  a  derivation  tree  T  by  its 
depth  L  and  the  counting  statistics  zi(i,k),l  =  1, . . , ,  L,  i  = 
1, . . . ,  \Vn\,  and  k  =  1, . . . ,  |i2|.  The  counting  statistic  zi(t,  k) 
is  the  number  of  non-terminals  er;  £  Vj v  rewritten  at  level  / 
with  rule  C  R-  With  these  statistics  the  probability  of  a 
tree  T  is  given  by 

L  (VXrl  \R\ 

^=nnn^^  « 

1  =  1  i=l  k  =  l 

In  this  model,  the  probability  of  a  word  string  W\tN  = 
w\u>2  . . .  wn ,  /3(WitN),  is  given  by 

ftWi.N)=  Y  ff(r)  (2) 

T£Parses(  W1  > 


where  Parses(WitN)  is  the  set  of  parse  trees  for  the  given 
word  string.  For  an  unambiguous  grammar,  Parses (Wi,jv) 
consists  of  a  single  parse. 


III.  Markov  Random  Field  Models 

We  now  consider  adding  bigram  relative  frequencies  as  con¬ 
straints  on  our  stochastic  context-free  trees  inducing  linear 
constraints  on  the  leaves  of  the  CF  tree.  For  a  given  word 
string  W\}n  =  W\W2  *  •  •  wn ,  the  relative  frequency  of  the  word 
pair  ViVj  is 


CVtVj(WljN) 


N  —  l 

=  N  _  1  ^vivi(Wk>  w*+l)  (3) 


k~l 


where  Vi ,  v3  £  Vr . 

Theorem  1  [3] 

The  probability  distribution  on  trees,  p(T),  minimizing  the 
relative  entropy  with  respect  to  the  distribution  t(T)  defined 
by  a  stochastic  context-free  grammar, 

5>(t)1o8!§)  <4> 

subject  to  the  bigram  constraints  {E 

■H v±vj} Vi,Vj£Vj<  is 

p(T)  =  1  exp  (  Y  “•! v3CVlV,(WliN)  j  7 r(T) 

\  vi€Vj>  v-2 GVrp  / 

where  Z  is  the  normalizing  constant  and  the  aVlV2  are  the 
Lagrange  multipliers  chosen  to  satisfy  the  constraints. 

This  distribution  is  a  Markov  random  held  with  the  follow¬ 
ing  neighborhood  structure  on  the  leaves: 

p(wi\T\wi)  =  p(wi\wi-i,  Wi+x,  7^)  (5) 

where  7,  is  the  part-of-speech  of  Note  that  because  of  the 
added  lexical  neighbors,  the  distribution  is  no  longer  context- 
free. 

A  second,  more  computationally  efficient  model  which  re¬ 
tains  the  neighborhood  structure  of  the  MRF  above  is  given 
by  the  distribution 


CviVi(wltN) 
N- 1 


P (T)  =  (6) 

i=2 

where  T*  is  a  tree  down  to  the  preterminal,  or  part-of-speech, 
level.  This  model  is  interpreted  as  a  SCF  model  generating  a 
sequence  of  parts-of-speech  with  word  attachment  according 
to  a  non-stationary  Markov  chain. 
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Abstract  —  A  multialphabet  arithmetic  coding 
with  weighted  history  model  is  presented  for  variable 
length  coding  of  the  video  symbols  in  video  compres¬ 
sion  applications. 

I.  Introduction 

A  limited  past  history  model  introduced  by  Ghanbari[l]  uses 
a  limited  number  of  past  symbols  to  estimate  the  probabil¬ 
ity  distribution.  This  model  takes  relatively  large  buffer  to 
achieve  its  optimal  compression  performance.  Here  we  present 
a  weighted  history  model  that  uses  less  buffer  and  obtain  bet¬ 
ter  performance. 

II.  Weighted  History  Model 

Suppose  there  are  p  possible  occurrences,  and  the  alphabet 
used  in  arithmetic  coding  is  defined  as  S\ ,  •  •  * ,  Sp.  The  buffer 
size  used  in  the  limited  past  history  model  is  M,  and  the 
occurrence  of  Si  in  the  buffer  is  represented  by  O,  for  *11  index 
t  lies  between  1  and  p.  Adding  all  occurrence  in  the  buffer  thus 
obtains  the  buffer  size,  i.e.,  0\  T  O2  +  O3  +  •  •  •  +  Op  =  M} 
and  the  relative  frequency  of  symbol  Si  can  be  obtained  by 
freq(Si)  =  .  Therefore  the  corresponding  cumulative 

frequency  of  symbol  5,  is  cum-freq(Si)  =  fre(l(^k)- 

The  major  disadvantages  of  the  limited  past  history  model 
is  caused  by  the  requirement  that  the  occurrence  of  each  sym¬ 
bol  is  at  least  one  for  arithmetic  coding.  The  limited  past 
history  model  overestimates  the  probability  of  each  symbol 
by  and  the  total  overhead  probability  is  equal  to 

When  the  buffer  size  M  is  small,  the  overhead  probability  is 
almost  one.  That  is,  the  probability  distribution  obtained  by 
the  limited  past  history  buffer  is  nearly  invariant  to  occur¬ 
rence  in  the  history  buffer,  and  the  statistical  property  of  the 
source  data  is  not  reflected  by  this  model. 

To  enforce  the  relations  between  the  probability  distribu¬ 
tion  and  the  occurrence  in  the  buffer,  we  can  simply  induce  a 
weight  to  the  buffer.  Therefore,  the  frequency  of  the  tth  sym¬ 
bol  is  freq(Si)  =  •  The  total  overhead  probability  of 

the  weighted  history  model  is  which  is  much  smaller 

than  that  of  the  limited  past  history  model,  especially  when 
the  buffer  size  is  small. 

The  weighted  history  model  uses  less  buffer  than  the  limited 
past  history  model  does.  Consequently,  the  weighted  history 
model  has  a  faster  adaptation  and  the  local  redundancy  can 
be  exploited  more.  The  performance  of  the  arithmetic  cod¬ 
ing  with  weighted  history  model  for  various  buffer  sizes  and 
various  weights  was  investigated.  Fig.  1  uses  a  coded  data  of 
the  pyramid  VQ[2]  as  the  source  data.  Five  different  weights 
of  the  weighted  history  model  with  various  buffer  sizes  are 
shown.  From  this  figure,  it  can  be  seen  that  the  weighted  his¬ 
tory  model  really  outperforms  the  limited  past  history  model, 
especially  when  the  buffer  size  is  small.  The  large  weight  will 
reduce  the  probability  of  the  symbols  that  are  not  in  current 

1This  work  was  supported  by  National  Science  Council,  ROC 
under  the  contract  NSC82-0404-E009-338 


buffer.  If  the  next  symbol  is  not  in  the  history  buffer,  a  long 
codeword  will  be  assigned  to  represent  this  symbol  because  of 
low  probability.  From  our  experiments,  an  appropriate  weight 
for  the  weighted  history  model  is  in  the  range  from  16  to  128. 

III.  Hardware  Implementation 

The  weighted  history  model  uses  a  smaller  history  buffer  to 
model  the  cumulative  density  function  of  the  arithmetic  coder, 
and  uses  smaller  counters  to  record  the  cumulative  frequen¬ 
cies  than  the  limited  past  history  model.  This  is  because  each 
occurrence  in  the  history  buffer  is  multiplied  by  a  weight,  thus 
all  bits  below  the  weight  are  not  changed  if  the  weight  is  an  in¬ 
tegral  power  of  2.  Because  of  smaller  buffer  size  and  counters, 
the  weighted  history  model  is  well  suited  for  hardware  imple¬ 
mentation  in  conjunction  with  the  multiplication-free  multi¬ 
alphabet  arithmetic  coder  proposed  in  [3]. 

IV.  Conclusion 

A  weighted  history  model  can  solve  the  disadvantages  of  the 
limited  past  history  model.  The  performance  of  the  weighted 
history  model  is  better  than  the  limited  past  history  model, 
and  the  history  buffer  used  in  the  weighted  history  model  is 
smaller.  From  the  experiments,  it  can  be  seen  that  the  arith¬ 
metic  coding  with  weighted  history  model  is  good  for  image 
coding. 
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Abstract  —  Consider  the  problem  of  compressing 
a  uniformly  quantized  IID  source.  A  traditional  ap¬ 
proach  is  to  assign  variable  length  codewords  to  the 
quantizer  output  symbols  or  groups  of  symbols  (e.g., 
Huffman  coding).  Here  we  propose  an  alternative  so¬ 
lution:  assign  a  fixed  length  binary  codeword  to  each 
output  symbol  in  such  a  way  that  a  zero  is  more  likely 
than  a  one  in  every  codeword  bit  position.  This  re¬ 
dundancy  is  then  exploited  using  a  block-adaptive  bi¬ 
nary  arithmetic  encoder  to  compress  the  data.  This 
technique  is  simple,  has  low  overhead,  and  can  be  used 
as  a  progressive  transmission  system. 

I.  Encoding  Procedure 

A  continuous  source  with  probability  density  f(x)  is  quantized 
by  a  uniform  quantizer  whose  output  symbols  are  mapped  to 
b  bit  codewords.  The  first  codeword  bit  indicates  the  sign  of 
the  quantizer  reconstruction  point.  Each  successive  bit  gives 
a  further  level  of  resolution  and  is  assigned  so  that  zeros  are 
more  concentrated  near  the  origin.  Figure  1  illustrates  this 
mapping  for  6  =  4. 


Fig.  1:  Example  of  a  pdf  and  codeword  assignment  for  a  four  bit 
uniform  quantizer. 

We  assume  that  f(x)  is  symmetric  about  x  =  0  and  nonin¬ 
creasing  with  | a; |  so  that  the  probability  is  more  concentrated 
near  the  origin.  Such  sources  are  not  uncommon  in  practice. 
Because  of  this  assumption,  the  codeword  assignment  ensures 
that  a  zero  will  be  more  likely  than  a  one  in  every  bit  position. 

Codewords  corresponding  to  N  adjacent  source  samples  are 
grouped  together.  The  N  sign  bits  of  the  codeword  sequence 
are  encoded  using  a  block- adaptive  binary  arithmetic  encoder. 
Then  the  N  next  most  significant  bits  are  encoded,  and  so 
on.  Each  bit  sequence  is  encoded  independently-  at  the  zth 
stage  the  arithmetic  coder  estimates  the  unconditional  proba¬ 
bility  that  the  s'th  codeword  bit  is  a  zero.  This  can  be  viewed 
as  a  simple  progressive  transmission  system-  each  subsequent 
codeword  bit  gives  a  further  level  of  detail  about  the  source. 

The  obvious  loss  is  that  we  lose  the  benefit  of  inter-bit 
dependency.  E.g.,  the  probability  that  the  second  bit  is  a 
zero  is  not  in  general  independent  of  the  value  of  the  first  bit, 
though  the  encoding  procedure  acts  as  if  it  were.  However, 

1  The  research  described  in  this  paper  was  performed  at  the  Jet 
Propulsion  Laboratory,  California  Institute  of  Technology,  under 
contract  with  the  National  Aeronautics  and  Space  Administration. 


for  many  sources  (e.g.,  Gaussian  and  Laplacian),  this  loss  is 
small,  and  this  technique  often  has  lower  redundancy  than 
Huffman  coding,  because  the  arithmetic  coder  is  not  required 
to  produce  an  output  symbol  for  every  input  symbol. 

The  independent  treatment  of  the  codeword  bits  provides 
some  benefits.  The  overhead  required  increases  linearly  in 
6.  By  contrast,  because  the  number  of  codewords  is  26,  the 
overhead  of  block-adaptive  Huffman  coding  increases  expo¬ 
nentially  in  6  unless  we  are  able  to  cleverly  exploit  additional 
information  about  the  source  [2]. 

II.  Arithmetic  Encoder  Operation 

A  binary  arithmetic  encoder  has  a  single  parameter  P,  the 
anticipated  probability  of  a  zero.  We  encode  an  A-length 
sequence  of  bits  block- adaptively,  i.e.,  the  encoder  output  se¬ 
quence  is  preceded  by  overhead  bits  that  identify  to  the  de¬ 
coder  the  value  of  P  being  used.  By  using  log2  N  bits  of 
overhead,  we  could  specify  the  exact  frequency  of  zeros  in  the 
sequence,  but  by  using  fewer  bits  we  can  exchange  accuracy 
for  lower  overhead.  If  m  overhead  bits  are  used,  we  can  select 
2m  probabilities  {pi ,  p2, .  •  ♦  p2m  }  that  can  be  used  as  values 
for  P.  This  amounts  to  using  line  segments  to  approximate 
the  binary  entropy  function  [1]. 

Omitting  the  remaining  details,  we  find  that  for  large  JV, 
to  minimize  the  maximum  redundancy  (including  overhead), 
the  probability  values  are 

and  the  optimal  number  of  overhead  bits  m  is  approximately 
™  J  log2  N  +  log2  7T  -  1. 

The  encoder  counts  the  number  of  zeros  in  the  input  se¬ 
quence  to  determines  the  probability  index  i.  We  transmit 
m  bits  to  identify  i ,  followed  by  the  arithmetic  encoder  out¬ 
put  sequence.  The  encoder  and  decoder  both  use  parameter 

P  =  Pi- 


III.  Performance 

The  rate  R  of  the  bit-wise  arithmetic  coder  is  approximately 


A(Q)  + + 


N 


21^  +  log^-l  +  llog^ 


here  H{Q )  is  the  entropy  of  the  quantized  source  and  7 Z  is  the 
redundancy  due  to  independent  treatment  of  the  codeword 
bits. 
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Abstract  —  The  problem  of  enumerative  coding 
was  considered  in  [1]  for  the  first  time.  By  coding 
words  of  a  length  n  the  method  from  [1]  has  an 
encoding  and  decoding  speed  which  equals  to  0(n) 
when  n  — *  oo.  We  propose  a  code  which  has  the 
high  speed:  0(log"  n  log  log  n),  7?.  — ►  oo.  This  code  is 
close  to  author’s  method  from  [2]. 

I.  Introduction  and  the  Main  Idea 

The  problem  of  enumerative  coding  is  well  known  in 
Information  Theory  and  widely  applied  to  retrieval 
problems  and  combinatorial  analysis  [1].  The  suggested 
fast  code  uses  the  method  from  [2].  The  simplest 
but  important  example  of  enumerative  coding  is  the 
problem  of  translation  numbers  from  one  number  system 
to  another.  We  use  this  example  for  the  description  of 
the  main  idea  of  the  proposed  method  .  Let  we  have  to 
translate  the  number  x\...xn  from  the  in- system  (m  >  2) 
to  the  binary  system.  A  ”  common”  method  is  based  on 
well  -  known  Horner  scheme  : 

(1)  code(x iX2.--xn)  —  (...(aqm  +  x’2)m  +  +  xn 

When  we  calculate  in  the  binary  -  system,  we  obtain  the 
value  x\...xn  in  the  binary  -  system  .  We  shall  assess  the 
calculation  time  by  the  number  of  operations  on  single  - 
bit  words.  We  use  the  Schonhager-  Strassen  method  of 
multiplication  and  division  of  numbers. For  this  method 
the  time  of  multiplication  of  two  numbers  with  L  digits 
each,  is  equal  to  0(L  log  L  log  log  L),  L  —+  oo.  [3].  It  is 
easy  to  see  that  the  time  for  calculation  by  (1)  is  not  less 
than  cn2,  c  >  0,  n  — >  oo  .  Hence,  the  speed  is  not  less  than 
cn.  We  suggest  computing  by  the  scheme 

(2)  code(x  i...xn)  =  ((...((x'i??2-b  xT2)(m??2)  +  (x’3m  + X4))) 

+  xe)(mm)  +  (#7  m  +  £*s)-*-) 

In  this  case  the  main  part  of  multiplications  will 
be  implemented  on  comparatively  small  numbers  and 
when  (2)  is  used,  the  time  for  computing  is  equal  to 
0(n  log2  n  loglogn),n  — ►  00  and  the  encoding  speed  is 
equal  to  0(log2n  loglog  n)...  We  can  see  that  ’’proper” 
arrangement  of  brackets  allows  to  decrease  the  calculation 
time  essentially.  It  is  worthy  of  noting  that  the  described 
method  is  known  as  ’’divide  and  conquer”  principle  [3]. 

II.  Main  Result 

We  use  definitions  from  [1].  Let  A  =  {0,l,...,m  —  1} 
be  an  alphabet  of  m  letters, m  >  2,  An  be  the  set  of  all 


words  of  length  n  over  the  alphabet  A.  Let  an  arbitrary 
S  C  An  be  a  source.  Let’s  give  the  lexicographic  order 
to  words  5,  and  for  the  integer  1  <  k  <  n  and  for  the 
word  x\...Xk  (E  Afc,  denote  by  Ns{x  1  ■••#*)  quantity  of 
words  produced  by  S  and  having  the  prefix  x\...Xk  In  [1] 
the  code  by  formula 

n 

(3)  code(x i...xn)  -  EE  Ns(xi...Xi-ia) 

i  =  l  a<Xi 

was  proposed.  Let’s  define  for  xi...xn  €  S  . 

(4)  P(xx)  =  7V,(n)/|S|,  P(xk/x  i...xt_i)  = 
Ns(x i...Xk)/Ns(xi...Xk-i),k  =  2, n 

(5)  q(xk/xi-..Xk-i)  =  ^  P(a/xi---xk-i)>  k  =  li  ■••> n 

a<xk 

From  (3),  (4),  (5)  it  is  easy  to  obtain 

(6)  code(x  i...xn)  =  |5|(?(xi)  +  q(x2/x1)P(x1)+ 

q(x3/xiX2)P(xi)P(x2/xi)  +  ...) 

The  scheme  of  the  proposed  method  is  following  :  Each 
P(xk/x i...xjfc_i),  q(xk/xi...xk~i),  X\...Xk  G  Ak  can  be 
written  in  the  form  of  a  word  with  21ogn  +  0(1)  digits. 
Then  (6)  resulted  in  the  form 

code(x  i...xn)  =  |5’|((^(x*i )  +  q{x2/ xi)P(xi))+ 

(P{xi)P{x2/xi))(q(x3/xix2)  +  q(x4/...)P(x3/ ...))+ 

({P(x1)P(x2/xi)(P(x3/...)P(x4/ ...))(q(x  5 /...)+ 

q(x6/...)P{x  5/. •■)■•■) 

(Here  we  used  ’’proper”  arrangement  of  brackets  ,  as  if  we 
go  over  to  (2)  from  (1)).  Decoding  is  constructed  similarly, 
by  using  division.  It  is  easy  to  calculate  that  the  encoding 
and  decoding  speed  is  equal  to  O (log3  n  loglog  n)  when 
11  — *•  00. 
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Abstract  —  This  study  is  concerned  with  the  selec¬ 
tion  of  waveform  structure  and  message  redundancy 
for  reliable  reception  in  a  noisy  channel  and  for  not 
interfering  with  a  secondary  use  of  the  signal  such  as 
establishing  frequency  and  phase  synchrony.  We  will 
propose  and  analyze  the  Bit  Reversal(BR)  encoding 
scheme  of  inserting  redundancy  to  minimize  the  spec¬ 
tral  energy  near  zero  frequency.  The  primary  math¬ 
ematical  tool  for  this  analysis  will  be  Markov  chains 
with  finite  number  of  states  and  constant  transition 
probabilities. 


I.  Introduction 

The  analysis  of  power  spectrum  density  (PSD)  of  synchronous 
baseband  digital  signals  plays  a  fundamental  important  role 
in  the  design  of  communication  and  signal  processing  systems. 
From  [1,2,3],  we  see  that  the  PSD  is  characterized  by  modula¬ 
tor  design  (this  factor  determines  the  structure  of  the  model 
and  hence  the  transition  matrix)  and  signal  design  (chooses 
a  set  of  waveforms  used).  In  this  paper,  we  will  propose  and 
analyze  a  spectral  shaping  scheme,  the  Bit  Reversal(BR)  en¬ 
coding  scheme  that  increases  the  data  bandwidth  by  a  very 
small  fraction,  yet  reduces  the  spectral  energy  near  D.C.  to 
nearly  zero.  The  primary  mathematical  tool  for  this  analysis, 
like  most  digital  signal  format,  will  be  Markov  chains  with 
finite  number  of  states  and  constant  transition  probabilities. 
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Fig.  1 :  The  Bit  Reversal  Encoding  Scheme 
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Fig.  2:  The  BR  Encoding  Markov  Model  of  L=4 


II.  The  Proposed  BR  Encoding  Scheme 

The  idea  of  the  BR  encoding  model  is  to  attempt  to  balance 
the  number  of  -f-l’s  and  -l’s  in  the  transmitted  stream  by 
inserting  a  redundant  bit  every  L— th  bit.  This  bit  indicates 
whether  the  L-block  is  transmitted  directly  or  sign  reversed 
before  transmission.  The  decision  is  based  on  the  excess  of 
+l5s  or  -l}s  in  the  message  of  the  t— th  L-block  versus  the 
excess  in  the  transmitted  stream  from  time  zero. 

More  precisely,  assume  L  be  an  even  integer;  let  mn  be  a 
sequence  of  ±Ts,  representing  the  message  stream  and  let  xn 
be  the  transmitted  stream.  For  1  <  k  <  oo  define 

k  +  L 

Co  —  1  and  C*  =  Co  +  ^  ^  xn  (1) 

n~  1 

to  actively  maintain  the  digital  sum  variation  (DSV)  of  the 
transmitted  stream.  The  BR  encoding  scheme  is  summarized 
in  Fig  1. 

III.  PSD  of  a  BR  Encoding  Scheme  of  L=4 

Let  s  consider  a  special  case  of  L  =  4  for  BR  encoding  scheme, 
i.e.,  there  three  message  bits  in  each  frame  of  four  bits  (one 
redundant  bit).  Its  Markov  model  consists  of  four  states  of 
{£3,  Ei,  ELi,  and£_3}  as  shown  by  Fig  2.  This  is  the  model 
where  waveforms  being  probabilistic  functions  of  state  transi¬ 
tions.  Applying  theorem  in  [3],  we  have  the  PSD: 


_  4smc2(?r/T)  2smc2(7r/T)  . 

U)  T  ”  3(17  —  8  cos(27r/T))T  ^ 

+8  cos(27r/T)  -f*  19cos(47t/T)  +  13cos(67r/T) 

-f  13  cos(87t/T)  —  4cos(107t/T)) 

which  equal  to  zero  at  D.C.  The  data  bandwidth  is  increased 
by  a  very  small  fraction. 

IV.  Conclusion 

We  proposed  and  analyzed  a  new  spectral  shaping  scheme 
without  subcarriers,  the  BR  encoding  scheme.  Unlike  the  tra¬ 
ditional  method  modulating  the  data  onto  a  subcarrier  which 
will  result  in  a  much  larger  bandwidth  than  necessary  to  trans¬ 
mit  the  data,  it  increases  the  data  bandwidth  by  a  very  small 
fraction,  yet  reduces  the  spectral  energy  near  D.C.  to  nearly 
zero.  We  apply  a  general  formula  given  in  [3]  to  compute  the 
PSD  of  the  BR  encoding  scheme. 
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Abstract  —  Optimum  conditions  for  maximizing  the 
throughput  of  an  orthogonal  frequency  division  mul¬ 
tiplexed  (OFDM)  system  are  derived,  and  an  algo¬ 
rithm  for  achieving  them  is  presented.  Theoretical 
bounds  on  performance  are  derived  and  used  to  com¬ 
pare  OFDM  with  conventional  equalized  single  car¬ 
rier  QAM  for  both  the  conventional  error  probability 
criterion  and  a  criterion  based  on  the  mean-squared 
error.  OFDM  is  shown  to  achieve  greater  throughput 
than  equalized  single  carrier  QAM,  especially  at  low 
to  intermediate  signal-to-noise  ratios  and  on  channels 
with  poor  spectral  properties. 

I.  Summary 

Orthogonal  frequency  division  multiplexing  (OFDM),  a  form 
of  multicarrier  transmission,  has  attracted  attention  as  an  al¬ 
ternative  to  equalized  single  carrier  transmission  over  channels 
with  spectral  nulls,  multipath  or  fading. 

The  principle  of  OFDM  is  to  modulate  many  parallel  sub- 
carriers  by  dividing  the  high  rate  transmission  data  into  lower 
rate  sub-streams.  For  a  correctly  chosen  subcarrier  spacing, 
the  modulated  sub-streams  are  orthogonal,  and  hence  inter¬ 
channel  interference  (ICI)  is  avoided.  For  sufficiently  narrow 
subchannel  bandwidths,  the  system  can  be  considered  to  be 
a  set  of  parallel  Nyquist  I  channels.  Subchannels  which  have 
severe  attenuation  can  then  be  avoided,  and  subchannels  with 
good  gain-to-noise  characteristics  can  be  exploited  by  allocat¬ 
ing  them  more  power  and  data.  As  each  subcarrier  is  mod¬ 
ulated  using  low  rate  data,  the  symbol  period  is  far  greater 
than  for  a  single  carrier  modulated  at  the  same  total  data  rate. 
This  mitigates  the  effects  of  impulsive  noise  and  fading. 

Early  commercial  multicarrier  modems  used  guard  inter¬ 
vals  in  the  time  and  frequency  domains  to  reduce  the  effects 
of  intersymbol  interference  (ISI)  and  interchannel  interference 
(ICI).  Each  subcarrier  was  modulated  using  the  same  power 
and  data  rate.  Towards  the  end  of  the  sixties,  a  number  of 
authors,  notably  Chang  [l],  used  overlapping  orthogonal  spec¬ 
tra  to  increase  the  efficiency  of  multicarrier  systems.  More  re¬ 
cently,  Kalet  [2]  introduced  the  concept  of  adjusting  the  power 
and  data  assigned  to  each  subcarrier  to  increase  the  through¬ 
put  further. 

Kalet  stated  that  maximum  throughput  would  be  achieved 
when  the  data  and  power  assignments  were  such  that  each 
subcarrier  achieved  the  same  symbol  error  probability.  Based 
on  these  assumptions,  Zervos  and  Kalet  [3]  concluded  that 
OFDM  would  not  yield  significantly  greater  throughput  than 
decision  feedback  equalized  single  carrier  transmission. 

In  this  paper,  we  do  not  constrain  the  error  probability  to 
be  the  same  over  all  the  subcarriers.  An  optimization  proce¬ 
dure  is  used  to  determine  the  conditions  which  must  be  met 
to  achieve  maximum  throughput,  and  an  iterative  algorithm 
is  presented  which  will  rapidly  achieve  these  conditions. 

It  is  shown  that,  using  the  conventional  error  probability 
criterion,  OFDM  will  in  fact  always  outperform  decision  feed¬ 
back  equalized  single  carrier  QAM.  The  increase  in  through¬ 


put  is  most  significant  at  low  and  intermediate  signal-to-noise 
ratios,  where  error  propagation  renders  the  DFE  impractical. 

As  an  example,  the  NEXT-dominated  high-speed  digital 
subscriber  loop  is  considered.  Using  the  results  presented  in 
[4]  the  exact  error  probability  of  single  carrier  QAM  using  a 
DFE  will  be  compared  to  optimized  OFDM,  and  it  is  seen 
that  OFDM  gives  significantly  better  performance.  At  a  data 
rate  of  1.28  Mbps,  the  equalized  single  carrier  can  be  used  over 
wire  lengths  up  to  11.5  kft  at  a  bit  error  probability  of  10”5. 
For  the  same  data  rate  and  bit  error  probability,  OFDM  can 
be  used  for  lengths  up  to  15.5  kft. 

It  is  well-known  that  the  mean-square  error  (mse)  is  a 
tractable  criterion  in  the  design  of  linear  and  decision  feed¬ 
back  equalizers.  We  present  a  criterion  for  optimizing  OFDM 
transmission  which  is  based  on  the  mse.  It  enables  a  direct 
comparison  to  be  made  between  OFDM  and  equalized  single 
carrier  transmission  in  which  the  equalizer  is  designed  using 
the  minimum  mse  criterion.  Examples  show  that  OFDM  again 
outperforms  equalized  single  carrier  QAM,  especially  at  low 
and  intermediate  SNRs  and  on  channels  with  poor  spectral 
properties. 
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Abstract  —  A  multitone  transmission  scheme  that 
uses  a  nonlinear  binary  code  to  specify  multitone  sig¬ 
nal  constellations  is  proposed  and  motivated.  A  tech¬ 
nique  for  designing  the  nonlinear  code  is  presented, 
and  methods  for  making  symbol  decisions  and  era¬ 
sures  without  knowledge  of  signal  and  noise  strength 
parameters  are  given.  The  performance  of  a  system 
using  this  scheme  with  Reed-Solomon  error  control 
coding  is  discussed. 

I.  Introduction 

Multicarrier  modulation  schemes  have  received  increasing  at¬ 
tention  for  wireless  communication  systems  including  messag¬ 
ing  systems  [l]  and  microwave  radio  [2].  In  this  paper,  we 
consider  multicarrier  schemes  that  employ  on-off  keying  on  or¬ 
thogonally  spaced  subcarriers.  Such  systems  are  amenable  to 
low-complexity  noncoherent  demodulation,  and  because  mul¬ 
tiple  bits  are  transmitted  per  symbol,  these  systems  also  ben¬ 
efit  from  the  advantages  of  long  symbol  durations  including 
simulcasting  capability  and  reduced  requirements  for  channel 
equalization. 

A  natural  approach  is  to  use  parallel  channels  indepen¬ 
dently;  i.e.,  the  data  is  partitioned  into  separate  bit  streams 
that  are  each  modulated  on  separate  subcarriers.  In  this  work, 
we  explore  the  use  of  a  binary  code  across  the  separate  bit 
streams  to  introduce  and  exploit  dependencies  between  the 
modulated  subcarriers. 


where  b  is  a  fixed  threshold  and  defined  as  one  of  the 

following: 

Sm)  =  y  *  Cm  or  y  •  (Cm  —  cm)  y  •  C m  __  y  •  Cm 

llCmH  llC™ll  l|cm||  ||cm|| 

(cm  is  the  ones  complement  of  cm,  ||  •  ||  denotes  Hamming 
weight,  and  y  ■  0/||0||  =  0). 

III.  Multitone  Code  Design 

Using  linear  codes  for  specifying  the  multitone  constellations 
results  in  very  poor  performance  when  the  decision  rules  de¬ 
scribed  above  are  used.  We  have  therefore  focused  on  the 
use  of  nonlinear  codes  for  this  purpose.  Our  approach  is  to 
choose  a  linear  code  with  good  Hamming  distance  properties 
and  generate  various  nonlinear  codes  from  this  code  by  apply¬ 
ing  different  combinations  of  bit  inversions  (inverting  the  a't h 
bit  of  every  codeword  in  the  code  for  various  values  of  i).  The 
resulting  codes  can  be  have  the  same  distance  properties  as  the 
original  linear  code,  but  with  different  weight  distributions. 

Numerical  results  show  that  codes  containing  codewords 
with  very  small  or  very  large  Hamming  weight  perform  worse 
than  codes  with  less  variation  in  weight.  This  fact  may  lead 
one  to  conclude  that  constant  weight  codes  should  be  used. 
However,  the  Hamming  distance  properties  of  constant  weight 
codes  are  usually  inferior  to  those  of  codes  based  on  the  best 
linear  codes.  We  have  shown  that  the  guaranteed  error  cor¬ 
recting  capability  tm  of  constant  weight  codes  must  satisfy 


II.  System  Model 

The  multitone  modulation  scheme  we  consider  can  be  viewed 
as  a  generalization  of  M-ary  frequency  shift  keying.  A  multi- 
tone  channel  encoder  uses  a  binary  (n,  k)  code  to  specify  the 
mapping  from  data  to  multitone  signal  constellations  {c*}, 
i  =  1,  •  • . ,  2k.  The  Is  in  a  codeword  dictate  which  tones  are 
transmitted  simultaneously.  An  (N,  K)  singly  extended  Reed- 
Solomon  (RS)  code  with  N  =  2k  is  employed  to  provide  error 
control.  Multitone  symbols  are  interleaved  to  mitigate  the 
effects  of  channel  fading. 

The  demodulator  consists  of  a  bank  of  n  energy  detectors 
whose  outputs  are  denoted  by  the  vector  y  =  (i/1}  y2 ,  . . . ,  yn ). 
The  output  y  is  used  by  a  decision  device  that  makes  symbol 
decisions  or  declares  symbol  erasures.  The  decision  device  is 
followed  by  a  RS  decoder  that  employs  errors-and-erasures 
bounded-distance  decoding. 

Because  the  multitone  constellations  are  not  orthogonal  (in 
general),  a  maximum  likelihood  detector  requires  knowledge 
of  signal  and  noise  parameters.  In  many  practical  applica¬ 
tions,  these  parameters  will  not  be  known,  and  they  may  vary 
significantly  with  time.  We  therefore  consider  the  use  of  de¬ 
cision  devices  based  on  simple  linear  combinations  of  the  y,-s. 
We  have  investigated  decision  rules  of  the  following  form: 

Choose  /  Ct  if  i  =  arSmax™  d(rn)  and  >  6; 

]  erasure  otherwise 


j  even 


where  the  value(s)  of  I  that  maximize  tm  must  include  t  = 
Lf  J  or  Til  (*rn  must  also  satisfy  tm  <  min(2£,  2 (n  -  £))).  The 
error  correcting  capability  of  many  known  linear  codes  exceeds 
this  bound. 

IV.  System  Performance 

We  have  shown  that,  for  a  system  using  errors-only  de¬ 
coding  of  the  RS  code,  the  combination  of  a  good  nonlinear 
multitone  code  and  the  decision  rule  described  above  gives 
performance  in  AWGN  close  to  that  of  the  corresponding  lin¬ 
ear  multitone  code  with  maximum  likelihood  decoding.  Ad¬ 
ditionally,  we  have  shown  that  incorporation  of  errors-and- 
erasures  decoding  provides  significant  performance  improve¬ 
ments  in  channels  subject  to  AWGN  and  Rayleigh  fading. 

References 

[1]  R.  Petrovic,  W.  Roehr,  and  D.  Cameron,  “Multicarrier  modu¬ 
lation  for  narrowband  PCS,”  IEEE  Trans.  Veh.  Technol. ,  pp. 
856-862,  Nov.  1994. 

[2]  S.  Aikawa,  Y.  Nakamura,  and  H.  Takanashi,  “Performance  of 
trellis  coded  256  QAM  super- multicarrier  modem  using  VLSIs 
for  SDH  interface  outage-free  digital  microwave  radio”,  IEEE 
Trans.  Commun.,  pp.  1415-1421,  Feb. /Mar. /Apr.  1994. 


398 


Achievable  Rates  for  Tomlinson-Harashima  Precoding 

Richard  D.  Wese^and  John  M.  Cioffi 

Information  Systems  Laboratory,  Stanford  University,  Stanford,  California  94305 


Abstract  —  The  maximum  achievable  information 
rate  of  the  zero-forcing  Tomlinson-Harashima  pre¬ 
coder  (ZF-THP)  is  given  exactly.  Bounds  are  pro¬ 
vided  for  the  minimum  mean  square  error  (MMSE) 
THP.  Performance  of  THP  is  characterized  on  an  ex¬ 
ample  channel,  and  discussed  for  arbitrary  channels. 

Consider  the  power-constrained  additive  white  gaussian 
noise  (AWGN)  channel  with  intersymbol  interference  (ISI) 
where  a  real  input  sequence2  AT(D)  with  E[xk  ]  <  P  is  fil¬ 
tered  by  H(D)  and  distorted  by  real  AWGN  nk. 

For  this  channel,  we  compute  the  reduction  in  achievable 
rate  from  capacity  incurred  by  Tomlinson-Harashima  precod¬ 
ing  (THP)  combined  with  codes  designed  for  AWGN  without 
ISI.  Loss  due  to  finite  complexity  codes  will  be  neglected. 

Figure  1  shows  a  general  THP  system.  Linear  time  invari¬ 
ant  filters  F(D)  and  B(D)  are  chosen  to  minimize  optimality 
criteria  discussed  below.  B(D)  must  be  causal  and  monic. 


Figure  1:  Communication  system  using  THP 

rt  is  a  mapping  from  'll  to  (— t/2,t/2\  where  t  £  . 

Specifically,  Tt[vk]  =  vk  +  <*k  where  ak  is  the  integer  mul¬ 
tiple  of  t  for  which  Tt[ufc]  €  (— 1/2,  i/2].  Figure  2  is  equivalent 
to  Figure  1  with  ak  as  defined  above.  The  noise  n  is  n  filtered 
by  F(D). 


ak  hk 


Figure  2:  Communication  system  equivalent  to  Figure  1. 

ZF-THP  is  the  scheme  originally  proposed  in  [1,  2]  with 
F(D)  and  B(D)  chosen  so  that  yk  =  0  and  hk  is  white.  The 
ZF-THP  system  is  a  memoryless  channel  with  input  w  and 
output  Ft[wH-n].  The  channel  inputs  are  constrained  by  w  € 
(_t/2,  t/2]  and  E[x2]  <  P.  THP  transmitter  output  power  is 
roughly  t2/12  for  large  alphabet  PAM  [3].  Thus  we  restrict  our 
attention  to  the  choice  of  t  which  obeys  the  power  constraint 
with  equality  (i.e.  t  =  y/\2P).  This  system’s  achievable  rate 

*s  Izf-thp  =z^og2(t)  —  h(Tt[h])  (1) 

where  /?(■)  denotes  differential  entropy. 

1  Email:  wesel@isLstanford.edu.  This  work  was  supported  by  an 
AT&T  Foundation  Fellowship  and  NSF  grant  NCR-9203131. 

■^Sequences  will  be  denoted  by  their  formal  D-transforms 
X(D)  =  ^kxkD-k. 


The  MMSE-THP  is  obtained  by  choosing  F(D)  and  B(D) 
to  minimize  VAR(n  +  y).  Ideal  interleaving  is  assumed  which 
produces  a  memoryless  channel  with  input  w  and  output 
rt[tu  -j-  y  +  n],  where  w  is  constrained  as  above.  Our  bounds 
for  this  system  are 

Immse-thp  >  log  2(t)-MT(<T2,t))  (2) 

ImMSE-THP  <  l°g2(t)  “  (3) 

where  T(<r2,t)  is  a  zero  mean  Gaussian  truncated  to 
(-t/2,  t/2]  with  variance  <x 2  =  VAR(y  +  Tt[n])  after  trunca¬ 
tion.  Note  that  VAR(ru)  depends  on  F(D)  and  thus  the  right 
hand  sides  of  Equations  (1)  and  (3)  are  not  equivalent. 

Figure  3  plots  Equations  (1)  (2)  and  (3)  as  well  as  capacity 
for  a  50  tap  bandpass  ISI  channel  with  AWGN. 


Figure  3:  Information  rates  for  example  channel 


For  any  H(D)  both  MMSE  bounds  converge  to  log2(fc)  - 
ilog2(27r eE[n2])  as  SNR  -4  oo.  At  high  SNR,  ZF-THP  and 
MMSE-THP  identically  suffer  only  the  1.53  dB  or  .255  bit 
“shaping  loss”  from  capacity  regardless  of  H(D). 

At  low  SNR,  THP  achievable  rates  can  still  be  considerably 
below  capacity.  Here,  the  loss  is  due  entirely  to  the  receiver 
Tt  (and  interleaving  for  the  MMSE-THP).  This  behavior  is 
the  reverse  of  that  observed  at  high  SNR  where  the  loss  was 
entirely  due  to  the  transmitter  rt. 

In  Figure  3  the  MMSE-THP  outperforms  the  ZF-THP. 
This  may  be  true  for  all  H(D).  We  have  shown  that  the  ZF- 
THP  rate  will  never  be  more  than  .08  bits  per  channel  use 
above  the  MMSE-THP  rate. 
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I.  Introduction 

Partial-response  modulations  are  widely  used  for  spectrum 
shaping  and  bandwidth  reduction.  Linear  partial-response 
signaling  (PRS)  is  a  well-known  form  of  linear  coded  mod¬ 
ulation  which  traces  back  to  Lender  [2]  in  1963.  Due  to  its 
“age,”  it  is  widely  believed  that  the  properties  of  linear  PRS 
have  been  thoroughly  investigated.  However,  this  is  not  the 
case.  Due  to  the  lack  of  efficient  algorithms  for  maximum- 
likelihood  detection  the  research  on  the  subject  concentrated 
on  the  simplest  forms  of  PRS,  and  mainly  with  suboptimal 
detection  [1]. 

In  this  paper  we  analyze  the  properties  of  PRS  signals  gen¬ 
erated  from  complex-valued  functions.  Those  signals  have  not 
only  the  usual  intentional  intersymbol  interference  (ISI),  but 
also  an  intentional  interference  between  the  quadrature  com¬ 
ponents  of  the  RF  modulated  signal.  Our  objective  is  to  show 
how  these  two  forms  of  interference  should  be  used  together. 

The  theory  presented  here  is  particularly  important  in  the 
design  of  PRS  generators  for  bandwidth  efficient  coded  mod¬ 
ulation  [3],  which  are  schemes  with  severe  ISI.  Furthermore, 
since  the  results  are  valid  for  intentional  or  non-intentional 
ISI,  they  can  also  be  used  to  improve  performance  of  commu¬ 
nications  in  non-ideal  channels. 

The  lowpass  representation  of  a  linear  PRS  is  defined  by 

CO 

si(t)  =  ^  u[n]h(t-nT),  (1) 

n=  —  co 

where  u[rc]  is  the  data  sequence,  and  h(t)  is  a  spectrum  shap¬ 
ing  (generator)  pulse.  Note  that  here  both  u[n]  and  h(t)  may 
be  complex- valued,  with  the  real  and  imaginary  parts  of  si(t) 
corresponding  to  the  in-phase  and  quadrature  components  of 
the  RF  modulated  signal.  In  most  of  the  literature  about  lin¬ 
ear  PRS  coding  the  shaping  pulse  h(t)  is  considered  to  be  real¬ 
valued,  even  when  the  data  sequence  u[n]  is  complex- valued 
(e.g.,  partial-response  QAM  modulation).  Here  we  assume 
that  the  imaginary  part  of  h(t)  adds  interference  between  the 
quadrature  components.  We  stress  that  the  signals  sa  ( t )  con¬ 
sidered  in  this  paper  do  not  exist  at  baseband  as  a  real  signal; 
rather,  they  take  form  only  as  RF  signals.  One  can  consider 
the  problem  here  as  synthesis  of  coded  signals  directly  at  RF. 

The  spectrum  used  by  (1)  is  defined  by  the  spectral  power 
density 

*Sl(/)  =  a|tf(/)|2,  (2) 

where  a  is  a  constant,  and  H(f)  is  the  Fourier  transform 
of  h(t).  Note  that  the  dependence  between  the  in-phase 
and  quadrature  components  makes  the  spectral  power  den- 
sity  asymmetric,  i.e.,  H(f)  ^ 

1This  work  was  supported  by  CNPq,  Conselho  Nacional  de  De- 
s envoi vimento  Cientifico  e  Tecnologico,  Brazil. 


II.  Phase-Shifted-Data  (Generalized)  PRS 

The  combination  of  intersymbol  interference  with  the  quadra¬ 
ture  components  interference  produce  some  unexpected  re¬ 
sults.  For  instance,  it  is  demonstrated  [3]  that  if  we  apply  a 
complex  frequency  shift  to  h(t),  and  define  the  signal 

oo 

s2(t)  -  u[n]h{t-nT)  ej2nMt~nT\  (3) 

n— —  oo 

then  we  can  get  a  PRS  with  a  energy /band width  performance 
quite  different  from  that  of  signal  (1).  (Of  course,  in  the  well- 
known  case  where  there  is  no  ISI,  there  is  no  change  in  perfor¬ 
mance.)  We  evaluate  those  effects  by  deriving  the  theoretical 
asymptotic  error  probability,  and  also  measuring  it  via  simu¬ 
lations,  and  it  is  shown  that  the  energy  efficiency  can  improve 
when  fs  ^  0. 

Alternately,  if  we  apply  a  phase  shift  to  u[n],  and  define 

oo 

s3{t)=  ]T  u[n]h(t-nT)  ej2wnfsT,  (4) 

71=  — OO 

then  it  can  be  proved  that  the  generalized  PRS  (4)  has  exactly 
(not  only  asymptotically)  the  same  noise  immunity  as  (3).  At 
the  same  time,  the  spectrum  used  by  (4)  is  the  same  used 
by  (1),  and  it  is  not  shifted  as  with  (3). 

Some  important  conclusions  follow  from  the  results  above: 

•  When  the  pulse  h(t)  is  set  by  the  channel  ISI,  the  fre¬ 
quency/phase  shifts  of  (3)  or  (4)  allow  us  to  improve 
performance  by  relieving  the  effect  of  the  ISI. 

•  During  the  design  of  optimized  complex-valued  PRS, 
subject  to  a  bandwidth  constraint,  it  is  necessary  to 
consider  a  sliding  bandwidth  parameter  in  order  to  find 
the  optimal  signals.  Alternatively,  the  signal  can  be 
designed  for  a  bandwidth  centered  at  /  =  0,  and  be 
optimized  for  the  generalized  PRS  (4), 

•  Better  PRS  coding  schemes  may  be  synthesized  by  fre¬ 
quency  and  phase  shifts:  in  a  given  RF  bandwidth 
schemes  with  better  free  distance  exist,  or  at  a  fixed 
distance,  schemes  with  better  bandwidth  exist. 
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Abstract  —  We  present  a  new  self-training  method  for  adjusting 
the  coefficients  of  a  transversal  equalizer  with  T-spaced  taps  in  a  multi¬ 
level  partial  response  class-IV  (PRIV)  system.  Self-training  equaliza¬ 
tion  from  distorted  random  data  signals  is  inherently  more  difficult  to 
achieve  for  partial-response  systems  than  for  full-response  systems. 
Also,  because  of  the  lack  of  excess  bandwidth,  traditional  bandedge 
timing  recovery  schemes  cannot  be  applied  in  PRIV  systems.  On  the 
other  hand,  an  equalizer  with  T-spaced  taps  is  sufficient  to  obtain 
equalized  output  signals  for  arbitrary  sampling  phase.  Convergence 
with  the  known  self-training  algorithm  by  Sato  is  too  slow  to  ever 
reach  satisfactory  performance,  e.g.,  for  switching  to  decision-directed 
equalization,  when  no  recovered  clock  is  available  and  the  phase  of  the 
local  receiver  clock  drifts  only  slightly  relative  to  the  phase  of  the  re¬ 
ceived  signal.  Following  Sato,  in  the  described  self-training  equaliza¬ 
tion  algorithm  we  first  transform  the  equalizer  output  into  full-res¬ 
ponse  form,  then  compute  a  pseudo-error  signal,  and  finally  translate 
the  pseudo-error  signal  into  an  error  signal  for  the  desired  partial-res¬ 
ponse  equalizer  output.  This  error  signal  is  used  to  adjust  the  equalizer 
coefficients  according  to  the  LMS  algorithm.  The  new  method  differs 
from  the  Sato  algorithm  in  two  ways.  First,  the  channel  inversion  for 
obtaining  full-response  signals  is  accomplished  exactly  by  mixed  lin¬ 
ear  feedback  and  decision  feedback  equalization,  whereas  in  the  case 
of  the  Sato  algorithm  the  inversion  is  only  achieved  approximately. 
Secondly,  for  the  derivation  of  pseudo-error  signals  more  knowledge 
of  the  statistical  properties  of  ideal,  but  noisy  full-response  signals  is 
exploited.  Basically,  the  pseudo  errors  are  obtained  from  knowledge 
of  the  largest  positive  and  negative  symbol  values  and  the  probability 
of  the  occurrence  of  equalized  signals  in  the  interval  between  these  val¬ 
ues  and  outside  these  values.  In  the  absence  of  noise,  the  new 
pseudo-error  signals  vanish  as  equalization  is  achieved.  We  present 
simulation  results  illustrating  the  superior  convergence  properties  of 
the  new  self-training  method. 


Summary 


Self-training  adaptive  equalization  has  mainly  been  studied  for 
full-response  systems  in  the  past,  e.g.,  in  [l]-[3].  Methods  to  achieve 
self-training  equalization  for  partial-response  systems  have  been  pro¬ 
posed  in  [4]  and  [5]  for  linear  and  distributed-arithmetic  equalizers,  re¬ 
spectively. 

We  denote  the  output  of  the  linear  equalizer  by  y„ : 

y*  =  cn  xTn  ,  (1) 

where  cn  =  {c()n,...,c^_u}  represents  the  vector  of  equalizer  coeffi¬ 
cients  and  =  {x„,...,xn_N+1}  the  vector  of  signals  stored  in  the 
equalizer  delay  line  at  time  n .  The  objective  of  an  adaptive  equalizer 
for  a  PRIV  system  is  to  provide  an  equalized  signal  of  the  form 
yn  =  (a„  ~  an_2)  +  en  ,  (2) 


where  an  is  the  channel-input  symbol  and  en  is  an  error  signal  due  to 
noise  and  residual  signal  distortion.  We  describe  the  algorithm  for  qua¬ 
ternary  modulation.  In  this  case,  a„E{— 3,-l,+  l,  +  3}. 

We  first  transform  the  equalizer  output  y„  signal  into  a  full-res¬ 
ponse  signal  un  by  channel  inversion  via  mixed  linear  feedback  and 
decision  feedback: 

Un  =  >'n  +  Q  Un—2  +  0  -  Q)  2  ’  1  ' 

where  an  is  a  tentative  quaternary  decision  on  the  transmitted  symbol 


an  based  on  the  signal  un ,  and  0  <  g  <  1 .  We  then  define  a 
pseudo-error  en  by 

un  -  an  if  lw„l  ^  3 

(4) 

-  dn  sign(w„)  otherwise, 


Bn  = 


where  dn  is  a  non-negative  value  updated  at  each  iteration  as  follows: 


<5*  +  i  =  < 


dn  -  jA  if  I un\  >  3 


(5) 


<$n  +  4^ 


otherwise, 


and  A  is  a  positive  constant.  The  generation  of  the  pseudo-error  en 
is  based  on  a  priori  knowledge  of  the  statistics  of  the  signal  un .  In  the 
case  of  accomplished  equalization,  un  corresponds  to  the  quaternary 
channel  input  symbol  an  embedded  in  noise.  Therefore,  whenever  the 
event  \un\  >  3  is  observed,  we  can  use  un  -  an  as  a  trusted  error  to 
update  the  equalizer  coefficients.  If  we  observe  the  event  \un\  <  3 ,  no 
trusted  error  is  available.  In  this  case,  we  choose  to  update  the  equalizer 
coefficients  so  that  the  probabilities  of  the  events  \un\  <  3  and 
\u„\  >  3  assume  the  values  3/4  and  1/4,  respectively,  which  are  the 
probabilities  of  these  events  for  an  ideally  equalized,  noisy  quaternary 
signal.  This  is  achieved  by  setting  the  pseudo-error  equal  to 
—  sign(w„)  whenever  I un\  <  3  and  updating  the  value  of  dn  at 
each  iteration  so  that  becomes  larger  if  the  event  \un\  <  3  occurs 
more  often  than  expected  and  smaller  otherwise. 

The  LMS  algorithm  for  self-training  adaptive  equalization  is  given 
by 

cn+l  =  cn  -  a  (£n  -  Q  £„_2)  xn  ,  (6) 


where  a  is  the  adaptation  gain. 

We  present  simulation  results  which  show  that  convergence  is 
achieved  even  in  the  presence  of  significant  initial  clock  drift.  The  new 
algorithm  outperforms  the  known  self-training  technique  for  PRIV 
systems  by  Sato  [4]  in  terms  of  speed  of  convergence  and  achievable 
mean-square  error  in  the  steady-state.  The  new  approach  has  been  real¬ 
ized  in  a  prototype  transceiver  for  full-duplex  transmission  at  125 
Mbit/s  over  telephone-grade  twisted-pair  cables,  which  also  employs 
adaptive  near-end  crosstalk  cancellation. 
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Abstract  —  Results  regarding  two  aspects  of  joint 
Maximum  Likelihood  (ML)  data  sequence  and  inter¬ 
symbol  interference  channel  estimation  are  consid¬ 
ered:  (i)  the  joint-ML  estimation  problem  is  ill-posed 
when  based  on  the  continuous  time  observation,  and 
(ii)  processing  based  on  a  discrete  time  signal  model 
yields  equivalence  classes  of  data  sequences. 

I.  Continuous  Time  Processing 

We  consider  joint-ML  estimation  of  a  digital  data  sequence 
W)  and  a  dispersive  channel  impulse  response  h(t)  from  the 
complex  baseband  model 

r(t)  =  y(t)  +  n(t)  =  ^  aih(t  -  iT)  +  n(t),  (1) 

t 

where  n(t )  is  baseband  equivalent  of  additive  white  Gaussian 
noise.  The  data  sequence  is  assumed  to  be  independent  and 
uniformly  distributed  over  a  finite  alphabet,  while  h(t)  is  as¬ 
sumed  to  be  a  static,  deterministic  function  with  support  con¬ 
tained  in  [0,  LT). 

A  “chipped”  signal  notation  is  introduced  to  convert  an 
arbitrary  function  m(t)  into  a  vector  of  chip  functions 

“<(0  =  [  mi(t)  •••  m0(£)]T,  (2) 

where  the  ith  chip  is  m;(t)  =  m(t  +  iT)  for  t  €  [0 ,T)  and  zero 
otherwise.  Applying  this  notation  to  the  model  in  (1)  for  the 
observation  interval  [0,  kT )  yields 

=  yM  +  n k(t)  =  Ako  h (t)  +  nk(t)}  (3) 


II.  Practical  Processing 
Front-end  processing  structures  exist  which  neglect  a  small 
amount  of  high-frequency  energy  and  circumvent  the  ill-posed 
problem  [2].  The  output  of  such  a  front-end  is  modeled  by  a 
discrete  time  version  of  (3):  zk  =  Ak  o  f  +  wfc,  with  a  metric 
function  analogous  to  (5) 

Afc(Afc)  —  ||(I  —  P/c)  o  Zfc||2.  (7) 

The  residual  least-squares  error  metric  of  (7)  can  be  com¬ 
puted  recursively  with  k}  which  allows  the  problem  to  be  for¬ 
mulated  as  a  tree-search  with  per-sequence  channel  estima¬ 
tion  [1].  Practical  recursive  algorithms  truncate  this  search, 
maintaining  only  a  finite  number  of  candidate  paths. 

III.  Equivalent  Sequences 
The  metric  function  in  (7)  implies  that  data  sequences  with 
matrices  having  the  same  range  are  indistinguishable.  Thus, 
two  data  matrices  Afc  and  Dfc  are  equivalent  if  the  associated 
projection  matrices  are  equal:  PA*  =  pDfc  We  use  the 
notation  A*  =  Dfc  6  £(Ak ),  where  £(Ak)  is  the  set  of  all 
admissible  data  matrices  with  the  same  range  as  Afc.  An 
equivalent  characterization  is  that  there  exists  an  invertible 
(L  x  L)  matrix  M  such  that  Ak  —  DkM. 

We  characterize  these  classes  as  either  memory  less  equiv¬ 
alence  classes  or  memory  equivalence  classes.  A  memory  less 
equivalence  class  is  one  in  which  M  is  diagonal,  and  results 
from  rotational  invariance  in  the  symbol  constellation. 
Theorem:  For  BPSK  signals  ( ak  e  {-1,+1}) 


where  the  ((A;  -j-  1)  x  L)  Toeplitz  data  matrix  A*  has  ith  row 
L-f-1  Q>i— L-\-2  *  ’  *  Oi  J  .  (4 

For  a  hypothesized  Afc,  minimization  of  the  joint-ML  met 
ric  over  h(t)  results  in  the  critical  point  h (<;  Ak)  =  Aiork(t) 
where  Afc  is  the  pseudo-inverse  of  Ak.  Substitution  yields  £ 
metric  dependent  only  on  the  hypothesized  data  sequence 


rfc(A0  =  rfc(Afc,h(f;Afc)) 


-f 


rk  (t)oPkork(t)dt, 


where  Pk  =  A^A*.  is  the  matrix  which  projects  onto  the  range 
of  Afc. 

The  metric  suggested  in  (5)  does  not  exist  in  the  mean- 
square  sense.  To  illustrate  this,  consider  a  fixed  t  e  [0,:T)  so 
that 


E  {nf  (<)lW(t)}  =  tr  (P,tE  {nfc(f)n? (<)}  Pk) ,  (6) 

which  is  not  well  defined  since  E{n(£  +r)n*(t)}  =  Nq6(t). 
The  conclusion  -  i.e.,  that  the  joint-ML  channel  and  sequence 
estimation  problem  for  the  model  of  (1)  is  ill-posed  -  holds 
even  when  the  noise  is  colored  [1]. 


hm^P^iAk)  =  (Afc,— A*})  =  1,  (8) 

where  the  probability  is  over  all  Afc. 

The  proof  follows  from  two  facts:  (i)  the  probability  that 
all  2l  possible  values  of  ol  will  appear  in  Ak  goes  to  one,  and 
(ii)  for  those  Ak  which  contain  all  possible  rows,  there  are  only 
a  finite  number  of  M  which  yield  admissible  data  matrices. 

The  result  suggests  that  for  asymptotically  large  k ,  the  ef¬ 
fect  of  the  equivalence  classes  can  be  negated  by  differential 
encoding  and  decoding  (i.e.,  one  need  only  be  concerned  with 
memoryless  equivalence  classes).  However,  the  effect  of  mem¬ 
ory  equivalence  classes  on  the  short  term  acquisition  proper¬ 
ties  of  practical  algorithms  is  significant. 
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Abstract  —  Our  work  introduces  a  novel  data  de¬ 
tection  scheme  for  coded  PSK  in  the  presence  of  un¬ 
known  phase.  This  scheme  offers  a  performance  very 
close  to  coherent  in  cases  of  n  >  20,  and  requires  a  low 
complexity. 

I.  Introduction 

Coded  PSK  demonstrates  an  extreme  sensitivity  to  unknown 
channel  phase.  Without  a  careful  effort  to  deal  with  this 
phase,  the  gain  of  coded  PSK  may  be  greatly  diminished. 
In  the  case  of  slowly  varying  phase  (e.g.  constant  over  500 
symbols),  several  effective  data  detection  strategies  have  been 
proposed.  However,  in  communication  over  a  channel  with 
rapidly  changing  phase,  these  strategies  are  ineffective.  Re¬ 
cently  proposed  schemes  for  data  detection  in  a  rapid  phase 
change  environment  are  based  on  extending  the  ideas  of  Mul¬ 
tiple  Symbol  Differential  Detection  (MSDD)  to  coded  modu¬ 
lation  (e.g.  [1]).  These  schemes  offer  some  gains  over  coded 
DPSK,  but  they  are  still  unable  to  match  coherent  perfor¬ 
mance.  We  introduce  a  novel  coded  PSK  data  detection 
scheme  which  offers  a  performance  very  close  to  coherent  in 
cases  of  constant  phase  over  20  or  more  symbols.  This  scheme 
requires  a  low  complexity,  and  it  employs  a  Viterbi  Algorithm 
(VA)  implementation. 

II.  Receiver  Design 

The  received  signal  is  represented  by  r  =  (r0,  n, ...,  rjv-i), 
where  r,  =  aie*9'  +  r)i.  Here,  77; ’s  represent  samples  from  an 
AWGN  source;  0;  corresponds  to  the  channel’s  phase  rotation; 
and  a,'  corresponds  to  a  differentially  encoded  MPSK  symbol 
generated  at  sample  time  i  by  a  trellis  encoder.  The  differen¬ 
tial  encoding  and  trellis  encoder  are  chosen  to  create  phase 
invariance  [2]. 

Our  data  detection  scheme  is  based  on  ML  detection. 
According  to  ML  detection,  the  best  output  sequence  a, 
given  a  received  r,  is  a  =  arg  max^A  p(l|a)»  where  A  is 
the  set  of  possible  a  sequences  generated  at  the  transmit¬ 
ter.  Introducing  the  unknown  phase,  this  becomes  a  = 
arg  maXae^t  JN  p(r| a,  0)p(0) <20,  where  JN  refers  to  Nth  order 
integration  (one  integral  per  phase  9i  in  0). 

Our  derivation  continues  by  introducing  the  information 
regarding  phase.  It  is  assumed  that  6{  is  constant  over  a  block 
of  n  symbols,  that  is,  60  =  0i  =  ...  =  0n- 1,  &n  =  #n+i  =  = 

02n-i,  and  so  on.  Using  this,  we  simplify  our  integral  equation 
and  achieve  an  intermediate  result. 

We  complete  our  derivation  by  approximating  the  contin¬ 
uous  phase  space  by  a  discrete  phase  space.  The  continu¬ 
ous  phase  space  is  $  =  [0,  2 x).  However,  because  we  have 

1This  work  is  supported  by  NSERC  Grant  OGP/NOll  and 
NSERC  Scholarship  106418 

2  also  with  SPAR  Aerospace  Limited,  Satellite  and  Communica¬ 
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introduced  a  differential  encoding  and  TCM  code  which  cre- 
ate  ”  phase  invariance,  we  can  map  the  output  of  our  re¬ 
ceiver  into  the  correct  sector  of  space  by  follow¬ 

ing  our  receiver  with  a  differential  decoder.  Hence,  it  suf¬ 
fices,  for  the  purposes  of  our  receiver,  to  represent  the  contin¬ 
uous  phase  space  by  ©  =  [0,  ^).  We  approximate  0  using 
0  —  { >  i  =  0, 1, ...,  m  —  1}.  It  can  be  shown  that  m  =  4 
is  sufficient  to  achieve  good  results.  Replacing  the  continuous 
phase  space  by  ©  in  our  ML  equation  leads  to  our  final  re¬ 
sult.  Specifically,  the  discretizing  of  the  phase  space  results  in 
the  integrals  becoming  summations.  Additionally,  it  is  easily 
shown  that  each  sum  is  well  approximated  by  the  largest  term 
in  the  sum.  This  results  in:  choose  the  a  from 

n  — 1  2n— 1 

max  y^lnp(ri|ai,  0o)  +  max  lnp(ri|a,,  6n) 

N-l 

+...+  „  max  V]  lnp(rt|a»,^n),  (1) 

'ZLrL'B(«{£,-l)n.)  i=Ln 

where  Oq  —  (a0, ...,  a„_i ),  =  (ant a2n-i),  and  aLn  = 

(aim, ...,  ajv- 1);  and  E{a^)  refers  to  the  end  node  of  sequence 
o0. 

III.  Implementation 

This  equation  is  implemented  as  follows.  Consider  first  the 
block  of  symbols  Oq  =  (ao,  ...,an-i)-  We  can  choose  the  best 
Oq  and  0O  to  each  end  node  E{a^),  since  this  is  the  only  term 
future  symbols  depend  on.  By  best,  we  mean  the  values  which 
maximize  the  first  sum  in  the  above  equation.  This  selection 
of  the  best  (o^flo)  can  be  carried  out  by  using  the  VA,  with 
metric  lnp(Ti\aii  0O),  over  the  first  block  of  n  symbols.  Specif¬ 
ically,  four  VA’s  are  carried  out,  1  for  each  possible  6q.  Next, 
consider  the  second  block  of  symbols,  an.  Much  like  the  pre¬ 
vious  set  Oq,  the  and  0n  can  be  chosen  to  each  end  node 
E(an).  Their  selection  can  be  carried  out  using  4  VA’s  over  the 
block  of  n  symbols,  an}  each  with  path  metric  lnp(ri|a;,  0n) 
(and  a  unique  0n  6  ©).  Here,  each  path  is  weighted  by  the 
appropriate  start  node  value.  This  continues,  in  an  analogous 
fashion,  over  the  remaining  blocks  of  symbols.  Putting  this 
together,  we  essentially  have  4  VA’s  running  over  the  block  of 
symbols. 

IV.  Performance 

The  performance  of  this  scheme  increases  as  n  increases.  Most 
notably,  considering  rate  2/3,  8-PSK  TCM,  the  performance 
of  this  scheme  is  very  close  to  coherent  for  all  n  >  20. 
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Summary -  Decision  feedback  equalization  (DFE)  is 
generalized  within  the  context  of  linearly  modulated  data 
transmission  over  intersymbol  interference  (ISI)  channels. 
The  main  motivation  for  this  new  approach  is  that  for 
channels  with  severe  ISI,  linear  and  decision  feedback 
equalizers  have  a  poor  performance  while  the  the  Viterbi 
algorithm  has  a  complexity  that  is  exponential  in  the 
length  of  the  ISI  channel  response. 

The  delayed  decision  feedback  equalizer  (DDFE)  intro¬ 
duced  in  this  work  applies  to  FIR  as  well  as  HR  chan¬ 
nel  responses.  It  is  parametrized  by  two  integer  de¬ 
sign  parameters  M  and  L  with  L  <  M  and  a  subset 
S  C  {1,  •  •  • ,  M}  =  ft  of  indices  with  the  cardinality  of 
S  being  equal  to  L.  This  DDFE  is  denoted  as  (M,  L,  5)- 
DDFE.  The  parameter  M  is  equal  to  the  decision  delay 
in  units  of  symbol  duration,  L  determines  the  computa¬ 
tional  complexity  per  symbol  (CCS)  of  the  DDFE  algo¬ 
rithm  which  is  0(FL)  where  F  is  the  data  symbol  alpha¬ 
bet  size.  For  a  given  channel,  and  fixed  values  of  M  and 
L,  the  subset  S  is  chosen  to  optimize  the  performance 
of  the  DDFE.  This  optimized  DDFE  is  denoted  as  the 
(MyL)- DDFE.  The  subset  optimization  adds  only  to  the 
design  complexity  but  not  the  implementation  complex¬ 
ity  for  a  fixed  channel.  Performance  is  defined  as  the  SNR 
gain  over  the  conventional  DFE  in  the  high  SNR  region. 

The  connections  with  previous  results  are  as  follows.  In 
the  degenerate  case  where  M  =  L  =  1,  the  DDFE  reduces 
to  the  conventional  DFE  [1].  For  a  given  L,  when  M  =  L, 
we  have  S  =  ft,  so  that  the  (L,  L,  ft)-DDFE  is  equivalent 
to  the  (£,  T)-DDFE,  which  can  be  shown  to  be  equivalent 
to  the  ( Ly  1)-BDFE  (block  decision  feedback  equalizer)  of 
[2].  For  this  case,  our  performance  analysis  sheds  new 
light  on  the  BDFE. 

Example-  Consider  a  binary  PAM-ISI,  monic,  causal, 
min-phase  channel  G(z)  =  YZo^)^  with  #(1)  =  a 
and  a  antipodal  symbol  alphabet  {+1,-1}.  For  the 
(2, 1)-BDFE  which  is  also  the  (2,2)-DDFE,  it  can  be 
shown  that  the  SNR  gain  is  given  as 


*7(2,2  )-DDFE 


f  1  +  a2  if  H  <  1/2  ; 

\  1  +  (1  —  |a|)2  else. 


This  SNR  gain  is  thus  greater  than  unity  implying  a  uni¬ 
formly  better  performance  than  the  conventional  DFE. 
Applying  this  result  for  the  case  of  the  single-zero  chan¬ 
nel  model  1  +  <xz_1,  we  can  deduce  that 


*7(2,2  )-DDFE 


*7  VA 
1+(1- 


M)2 


l  +  a2 


WA 


if  |a|  <  1/2  ; 
else 


where  t]va  is  the  SNR  gain  over  the  conventional  DFE  of 
the  Viterbi  Algorithm  so  that  when  \a\  <  1/2,  the  (2,2)- 
DDFE  has  a  performance  that  is  indistinguishable  from 
the  more  complex  Viterbi  algorithm. 

The  (My  L)-BDFE  of  [2]  when  L  >  1  is  not  a  use¬ 
ful  generalization  of  (M,  1)-BDFE.  The  only  “block”  size 
in  the  feedback  loop  that  is  meaningful  in  block  deci¬ 
sion  feedback  equalization  is  1,  the  degenerate  case.  The 
reason  is  as  follows.  The  CCS  of  the  (M,  T)-BDFE  is 
0(FM)  and  is  relatively  independent  of  L  (for  sufficiently 
large  values  of  these  parameters  so  as  to  ignore  polyno¬ 
mial  dependencies).  Furthermore,  it  is  out-performed  by 
the  (My  1)-BDFE.  A  stronger  result  is  that  the  (M,  L)- 
BDFE  is  outperformed  by  the  (AT,  1)-BDFE  (or  equiva¬ 
lently  the  ( N ,  N)-DDFE)  where  N  =  M  ~  L  +  1.  There¬ 
fore,  among  the  (M,  L)-BDFEs,  those  with  L  >  1  can  be 
outperformed  by  the  corresponding  (V,  1)-BDFE  which 
has  a  better  performance  and  a  lower  complexity. 

The  (My  T)-DDFE  with  M  >  L  on  the  other  hand, 
performs  no  worse  than  the  (T,  1)-BDFE  (or  equivalently 
the  (Ly  L)-DDFE).  This  is  an  appropriate  comparison  be¬ 
cause  the  CCS  of  both  these  schemes  is  given  by  0(FL). 
Consequently,  even  the  best  candidates  from  the  BD- 
FEs  can  be  improved  for  the  same  CCS  by  the  DDFEs. 
Moreover,  the  (M,  Li)-DDFE  uniformly  outperforms  the 
(My  L2)-DDFE  when  L2  <  L\  which  is  to  be  expected 
since  the  complexity  of  the  former  is  greater  than  that 
of  the  latter.  No  surprises  here.  The  following  example 
illustrates  the  superiority  of  the  DDFE  over  the  BDFE 
of  the  same  complexity. 

Consider  a  binary  PAM-ISI,  causal,  monic,  min-phase 
channel  G(z)  =  YZo^)^  and  let  g(  1)  =  1/8  and 
g( 2)  =  —31/64.  It  can  be  shown  that  the  (3,2)-DDFE 
has  an  SNR  gain  of  1.2462  relative  to  the  conventional 
DFE  whereas  the  (2, 1)-BDFE  (or  the  (2,2)-DDFE)  has 
an  SNR  gain  of  1.016  inspite  of  the  CCS  of  the  two  algo¬ 
rithms  being  identical.  Furthermore,  the  matched  filter 
upper  bound  on  the  SNR  gain  relative  to  the  DFE  is 
met  with  equality  by  the  Viterbi  algorithm  for  the  chan¬ 
nel  G(z)  =  1  +  (l/8)z~l  —  (31/64)z~2.  It  is  given  as 
t]va  =  1.2502.  Notice  that  the  (3,2)-DDFE  performs 
nearly  as  well  as  the  Viterbi  algorithm  without  involving 
any  sort  of  trellis  detection  and  it  performs  much  better 
than  the  (2, 1)-BDFE. 
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Abstract  —  The  problem  of  implementing  self- 
adaptive  equalization  algorithms  in  real-time  is  ad¬ 
dressed.  Self-adaptive  equalization  determines  the 
transmitted  sequence  without  using  a  training  se¬ 
quence.  Simulation  results  for  the  self-adaptive  tree 
search  procedures  based  on  Fano,  stack  and  M- 
algorithm  are  presented. 

I.  Introduction 

Many  problems  in  digital  communications  can  be  modeled  by 
means  of  a  discrete-time  finite-state  Markov  process  repre¬ 
senting  the  signal  which  is  observed  in  independent  identically 
distributed  noise.  We  are  considering  the  case  when  the  pro¬ 
cess  parameters  are  unknown.  We  are  investigating  methods 
to  exploit  the  structure  and  finiteness  of  the  state  space  of 
the  signal  to  determine  the  most  likely  state  sequence  without 
resorting  to  a  known  training  sequence .  We  will  refer  to  this 
approach  as  self-adaptive  MLSE. 

We  will  focus  our  attention  on  the  special  case  of  a  discrete¬ 
time  finite-state  Markov  process  in  which  a  sequence  of  equally 
likely  symbols  Sk  drawn  from  an  a  discrete  and  finite  alphabet 
A  is  input  to  a  channel  which  introduces  intersymbol  interfer¬ 
ence  in  addition  to  white  Gaussian  noise.  The  coefficients  0i , 
l  =  0, . . . ,  L  of  the  channel  impulse  response  are  assumed  to 
be  unknown  but  constant.  The  objective  of  our  work  is  now  to 
determine  the  most  likely  input  sequence  given  the  observed 
sequence  Vk  without  knowledge  of  the  channel  coefficients. 

II.  The  Self-Adaptive  MLSE 
In  [4]  we  propose  the  metric  for  the  self-adaptive  MLSE 


1000  antipodal  bits  and  channels  with  L  —  3  memory  ele¬ 
ments.  The  simulations  indicate  clearly  that  the  proposed 
“self-adaptive”  M-algorithm  matches  closely  the  performance 
of  the  optimum  (Viterbi)  search  algorithm  with  known  coef¬ 
ficients  if  the  number  of  retained  paths  is  chosen  sufficiently 
large.  It  also  demonstrates  that  at  higher  signal- to-noise  ratios 
the  required  number  of  paths  to  be  retained  decreases.  Sim¬ 
ilar  results  are  obtained  for  the  other  sequential  algorithms. 


Figure  1:  Simulation  Results  with  M-algorithm  (N  = 
1000,  L  =  3) 


(1) 


d(  s)  =  ||Psv||\ 


References 


where  s  and  v  are  vectors  comprising  the  input  symbols  and 
observations,  respectively.  If  S  is  an  (JV  +  L)  x  (L-f  1)  matrix 
whose  columns  are  shifted  versions  of  s  and  P8  is  projection 
matrix  Ps  =  S'(S'S)_1S.  Among  all  possible  input  sequences, 
we  are  looking  for  the  one  which  maximizes  the  metric  in  (1). 
The  optimal  sequence  is  then  the  one  which  spans  the  signal 
sub-space  containing  the  largest  portion  of  the  received  signal. 

This  observation  provides  the  basis  for  our  adaptation  of 
sequential  tree  search  algorithms,  originally  developed  for  de¬ 
coding  of  convolutional  codes,  to  the  problem  of  self-adaptive 
equalization.  In  particular,  we  consider  adaptations  of  the 
Fano  algorithm  [2],  the  stack  algorithm  [3],  and  the  M- 
algorithm  [1]. 

As  an  illustrative  example  for  our  results,  Figure  1  shows 
the  results  of  a  series  of  simulations  with  sequences  of  N  = 
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Abstract:In  this  paper,  to  improve  both  bandwidth  ef¬ 
ficiency  and  error  performance,  partial  response  sig¬ 
nalling  (PRS)  and  trellis  coded  modulation  (TCM)  are 
combined  together  and  denoted  as  Modified/  Quadrature 
Partial  Response-Trellis  Coded  Modulation  (M/QPR- 
TCM)  for  M-PSK .  M/6QPR-TCM,  M/9QPR-TCM  and 
M/33QPR-TCM  schemes  are  introduced  for  Jf-PSK  and 
8-PSK  respectively .  In  colored  noise  environment  for  ne¬ 
gative  noise  correlation  coefficient  values  M/QPR-TCM 
schemes  outperform  better  than  the  classical  structures. 
In  fading  channel,  the  proposed  schemes  are  better  than 
their  counterparts  for  SNR  values  greater  than  a  thresh¬ 
old.  In  terms  of  spectral  efficiency  and  bit  error  rate  with 
decreasing  fading  parameter  K  values ,  M/QPR-TCM  sys¬ 
tems  appear  to  be  the  best  choise  in  the  literature. 

Summary 

The  block  diagram  of  the  M/QPR-TCM  scheme  con¬ 
sists  of  k  number  unit  memory  precoders  followed  by 
k/k- b  1  rated  convolutional  encoder  with  v  units  memory 
and  (1  -+■  D)  PRS  with  k  -b  1  units  memory  which  repre¬ 
sents  the  binary  correspondent  of  the  previous  signal.  K 
number  appropriate  precoders  are  included  into  the  sys¬ 
tem  to  prevent  the  undesired  catostrophic  nature  of  the 
partial  response  block.  The  precoders  do  not  increase  the 
number  of  trellis  states  because  of  equivalence  of  the  sig¬ 
nals  stored  simultaneously  in  the  delay  cells  of  the  coders 
and  PRS.  M/QPR-TCM  scheme  reduces  the  state  num¬ 
ber  of  the  combined  trellis  structure  from  2k2u2k+l  resul¬ 
ting  from  k-number  precoder,  v  unit  convolutional  enco¬ 
der  and  (k-b  1 )  units  (1+D)  PRS  memory  to  only  2u+k . 
In  this  paper,  to  give  practical  examples,  M/6QPR-TCM, 
M/9QPR-TCM  are  introduced  for  4-PSK  with  encoder 
memory  v  —  1  and  v  =  2  respectively  and  M/33QPR- 
TCM  for  8-PSK  with  encoder  memory  u  =  3. 

For  many  practical  trellis  coded  systems  where  the  no¬ 
ise  is  not  white,  correlation  between  noise  samples  affects 
error  performance  [l]-[2].  M/QPR-TCM  systems  perform 
better  than  the  related  schemes  for  negative  noise  corre¬ 
lation  coefficients. 

Under  the  assumption  of  ideal  channel  state  informati¬ 


on  and  infinite  interleaving/deinterleaving  [3]- [4],  analy¬ 
tical  bit  error  probability  upper  bounds  of  the  considered 
schemes  are  derived  and  compared  to  the  related  modula¬ 
tion  systems  in  fading  channels.  M/QPR-TCM  structures 
are  better  than  their  counterparts  for  SNR  values  greater 
than  a  threshold  for  small  values  of  fading  parameter  K. 
In  Rayleigh  fading  (K  —  0)  M/6QPR-TCM  performs  bet¬ 
ter  after  the  SNR  values  of  9.2  dB  .  Similarly,  error  perfor¬ 
mance  improvement  of  M/6QPR-TCM  occurs  at  9.5  dB 
for  I\=5  dB.  As  I\  increases,  where  AWGN  starts  to  domi¬ 
nate  the  fading,  the  performance  of  the  M/6QPR-TCM 
scheme  tends  to  decrease.  In  Rayleigh  fading,  M/9QPR- 
TCM  outperforms  better  at  SNR  values  greater  than  12 
dB.  This  improvement  begins  at  15  dB  for  Rician  (K=5 
dB)  fading  and  diminishes  completely  for  AWGN  as  usu¬ 
al. 

M/QPR-TCM  systems  appear  to  be  the  best  choice  in 
the  literature  in  terms  of  spectral  efficiency  and  bit  error 
rate  with  decreasing  K  values. 
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Abstract  —  Soft  decoding  of  binary  codes  based 
on  algebraic  decoding  is  treated.  The  algebraic  de¬ 
coder  generates  all  error  patterns  up  to  a  given  weight 
higher  than  the  designed  error  correcting  capability 
The  performance  of  different  soft  decoders  em¬ 
ploying  such  algebraic  decoding  is  investigated  and 
compared  with  hard  decoding,  the  Chase  second  al¬ 
gorithm  and  soft  maximum  likelihood  decoding.  Fur¬ 
thermore,  we  propose  an  iterative  decoder  using  an 
acceptance  criteria  to  determine  if  we  have  found  the 
maximum  likelihood  decision  estimate.  The  accep¬ 
tance  criteria  ensures  a  low  average  decoding  com¬ 
plexity  and  by  iterative  decoding  performance  close 
to  that  of  maximum  likelihood  decoding  is  obtained. 

I.  Introduction 

The  type  of  soft  decoders  we  consider  can  be  described  in  the 
following  wray.  In  the  first  step,  the  demodulator  outputs  an 
estimate  on  what  was  received  and  it  may  also  output  relia¬ 
bility  information  on  that  estimate.  In  the  second  step,  the 
estimate  is  decoded  with  an  algebraic  decoder  (with  or  with¬ 
out  help  of  reliability  information)  into  a  set  of  tentative  code¬ 
words.  Finally,  the  decoder  selects  as  a  decision  the  codeword 
’’closest”  to  the  received  sequence  with  respect  to  Euclidean 
metric.  Two  well-known  decoders  of  this  type  are  proposed 
in  [1]  and  [2],  For  a  given  code  the  performance  of  the  soft 
decoder  depends  on  the  reliability  information  used,  the  al¬ 
gebraic  decoders  efficiency  in  finding  tentative  codewords  and 
the  decision  strategy. 

The  central  problem  is  how  to  efficiently  generate  a  set  of 
code  words  such  that  it  contains  the  maximum  likelihood  de¬ 
cision  (MLD)  estimate  of  the  transmitted  codeword  with  high 
probability.  Also,  it  is  desirable  to  find  the  MLD  estimate  of 
the  transmitted  codeword  as  soon  as  possible.  When  can  the 
generation  of  tentative  codewords  be  stooped?  That  is,  when 
is  the  codeword  corresponding  to  the  maximum  likelihood  de¬ 
cision  in  the  set  of  codewords  already  found? 

II.  The  decoder 

For  a  given  code  let  dmin  and  escH  denote  the  minimum 
Hamming  distance  and  the  designed  error  correcting  capabil¬ 
ity  respectively.  The  algebraic  decoder  we  use  finds  all  error 
patterns  of  weight  at  most  t  +  e  (e  >  0),  where  2t  +  1  =  dmin 
and  t  =  cbch  )  see  [3].  Our  soft  algebraic  decoder  selects  as 
decision  the  ’’best”  codeword,  in  terms  of  Euclidean  metric, 
among  all  tentative  codewords  found.  We  note  that  as  long 
as  the  covering  radius  is  less  or  equal  to  t  +  e  at  least  one 
codeword  is  found. 

III.  Results 

We  have  compared  different  strong  versions  of  our  decoder 
with  hard  decoding  (t  error  correction),  the  Chase  second  al¬ 
gorithm  and  a  lower  bound  for  soft  maximum  likelihood  de¬ 
cision  (MLD)  decoding.  In  the  evaluation  (simulations)  we 


consider:  binary  BCH  codes,  at  most  (t  -f  2)-error  correction, 
transmission  over  the  additive  white  Gaussian  noise  channel 
(AWGN),  and  binary  antipodal  modulation.  Our  results  show 
that  decoding  up  to  the  covering  radius  is  important,  i.e.,  such 
that  at  least  one  codeword  is  found.  Then,  at  least  for  the 
cases  we  have  considered,  the  soft  algebraic  decoder  performs 
better  than  the  Chase  second  algorithm. 

IV.  Further  improvements 

If  performance  close  to  that  of  soft  MLD  decoding  is  desired  we 
propose  to  use  an  iterative  decoder  employing  MLD  estimate 
tests.  That  is,  a  test  which  can  determine  if  we  have  found 
the  MLD  estimate.  From  a  practical  point  of  view  iterative 
decoding  is‘  probably  a  better  option  than  generating  error 
patterns  of  weight  much  higher  than  t.  That  is,  generating 
such  error  patterns  is  complicated,  very  many  may  exist  and 
the  decoder  has  to  be  designed  for  the  worst  case.  On  the 
other  hand,  such  error  patterns  seldom  have  to  be  considered 
if  an  MLD  estimate  test  is  used. 

The  proposed  MLD  tests  are  based  on  comparing  with  a 
competing  word  which  is  ’’close”  to  the  received  word.  Related 
tests  for  t— error  correction  can  be  found  in  the  literature. 
However,  our  tests  are  developed  for  a  decoder  correcting  more 
than  t  errors.  This  makes  the  test  more  efficient. 

When  the  codeword  tested  is  not  the  MLD  estimate  the 
MLD  estimate  will  hopefully  be  close  to  the  competing  word. 
We  show  that  this  often  is  the  case.  Then  as  a  second  de¬ 
coding  attempt,  when  the  MLD  estimate  test  fails,  we  decode 
the  competing  word.  We  can  continue  and  generate  a  second 
competing  word  and  perform  a  third  decoding  attempt  and 
so  on.  Important  is  that  the  algebraic  decoder  corrects  up 
to  the  covering  radius  in  Hamming  metric.  Such  an  algebraic 
decoder  ensures  that  at  least  a  fairly  good  estimate  of  the 
transmitted  codeword  is  found  already  in  the  first  decoding 
attempt. 

We  have  investigated  two  versions  of  iterative  decoding, 
at  most  two  decoding  attempts  and  at  most  three  decoding 
attempts.  In  both  versions,  however,  due  to  the  MLD  estimate 
tests,  the  average  number  of  decoding  attempts  is  close  to 
one.  For  the  cases  we  have  investigated  the  iterative  decoder 
is  much  more  powerful  than  the  Chase  second  algorithm  and 
its  performance  is  close  to  that  of  soft  MLD  decoding. 
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Abstract  —  The  quaternary  Goethals  code  is  a  Z4- 
linear  code  of  length  2m  which  has  22  +1-3m-2  code¬ 
words  and  minimum  Lee  distance  8  for  any  odd  m  >  3. 
The  Gray  map  of  this  code  is  known  to  be  a  nonlinear 
binary  (2m+1,22  +  “3m-2,8)  code.  The  covering  radius 
of  the  ^-linear  Goethals  code  is  6  and  we  present  a 
complete  decoding  algorithm  for  the  code. 


I.  Introduction 


Let  Z4  denote  the  ring  of  integers  modulo  4  and  let  R  be  a 
Galois  ring  of  characteristic  4  with  4m  elements.  The  multi¬ 
plicative  group  of  units  in  R  contains  a  unique  cyclic  subgroup 
of  order  2m  —  1.  Let  (3  be  a  generator  of  this  subgroup  and  let 
T  —  {0, 1  ,/?,■■■  ,/327n“2}.  Let  fi  :  Z4  ^  Z2  denote  the  modulo 
2  reduction  map.  We  can  extend  fi  to  R  in  a  natural  way  and 
it  can  be  shown  that  fi(T)  =  F ,  where  F  is  a  finite  field  of 
order  2m. 

The  Gray  map  0  is  defined  by  0(0)  =  00,  0(1)  =  01,  0(2)  = 
11  and  0(3)  =  10.  Let  C  be  the  binary  code  defined  by  C  = 
0(C),  where  C  is  the  quaternary  code  with  parity-check  matrix 
given  by 


H 


1  1  1  1  •••  1 

0  1  13  (32  ...  p2™*2 

0  2  2 (3s  2f36  2/?3(2m“2) 


In  Hammons,  Kumar,  Calderbank,  Sloane  and  Sole  [1],  it 
is  shown  that  if  m  is  odd,  then  C  has  minimum  Lee  distance  8 
which  is  equal  to  the  minimum  Hamming  distance  of  C.  The 
binary  (2m+1, 22  +  -3m~2,  8)  code  C  has  parameters  that  are 
identical  to  the  (extended)  binary  Goethals  code. 

The  purpose  of  this  paper  is  to  give  a  complete  decoding  al¬ 
gorithm  for  the  triple  error-correcting  Z4-linear  Goethals  code 
C,  i.e.,  an  algorithm  that  for  any  received  vector  finds  the  clos¬ 
est  codeword. 


II.  Decoding  of  the  Goethals  code 
Let  r  e  Z2  be  the  received  vector  and  let  e  €  Z2™  be 
the  error  vector.  The  syndrome  of  the  received  vector  is  S  = 
r Htr  =  eHtT  =  (<,  A+2B,  2 C)  where  t  G  Z4,  A,  B,  C  G  T  and 
Htr  denotes  the  transpose  of  H.  We  index  the  components 
of  a  vector  e  £  Z\  by  the  elements  of  T,  i.e.  e  =  {ex)xer- 
The  syndrome  equations  that  have  to  be  solved  are 


J2ex  = 

xer 

J2exX  = 

xer 

I'ZexX3  = 
xer 


t,  t  £  Z4 
A  +  2B,  A,B  eT 
2C,  C€T. 


Let  X ,  y,  A,  B ,  etc.  denote  elements  in  T  and  x,  y,  a ,  b 
their  respective  projections  modulo  2  in  F.  For  any  coset  it  is 
sufficient  to  find  the  projections  x ,  y,  and  z  in  F  of  the  error 
locations  of  a  coset  leader  and  the  corresponding  error  values 
ex,  ey,  and  ez  in  Z4  which  satisfy  the  syndrome  equations. 

We  first  find  the  unique  coset  leader  (i.e.,  a  vector  of  small¬ 
est  Lee  weight)  of  each  coset  which  contains  a  vector  of  Lee 
weight  <3.  As  an  example  the  decoding  of  cosets  corre¬ 
sponding  to  syndromes  with  t  =  1  are  given  below.  The  cases 
t  =  0,2  and  3  are  similar. 

Theorem  1  Let  S  =  (1,  A  -f  2 H,  2 C)  denote  the  syndrome 
of  a  coset. 

(i)  If  b  =  0  and  c  =  a3,  then  the  coset  leader  has  Lee  weight 
1  and  is  uniquely  determined  by  x  =  a  and  ex  =  1. 

(ii)  If  b  7^  0  and  c  =  a3,  then  the  coset  leader  has  Lee 
weight  3  and  is  uniquely  determined  by  x  =  o  -f  5,  ex  —  2, 
y  =  a  and  ey  =  —  1. 

(iii)  If  b  ^  0,  c  7^  a3  and  Tr(b3/(a 3  +  c))  =  0,  then  the 
coset  leader  has  Lee  weight  3.  The  coset  leader  is  uniquely 
determined  such  that  x  and  y  are  solutions  of  b2u2  +  (a3  + 
c)u  +  a4  +  a2b2  +  ac  +  bA  =  0,  ex  =  ey  —  1,  z  =  a  +  and 
ez  =  —  1. 

(iv)  If  a(u)  =  u3  +  au 2  +  (a2  +  b2)u  +  ah2  +  c  has  three 
distinct  zeros  in  F  then  a  coset  leader  has  Lee  weight  3  and 
is  uniquely  determined  such  that  x ,  y,  z  are  the  three  distinct 
zeros  in  F  of  cr{u)  and  ex  =  ey  =  ez  =  —  1. 

(v)  If  none  of  (i)-(iv)  hold,  then  any  coset  leader  has  Lee 
weight  >  5. 


III.  Complete  decoding 


In  the  considerably  more  complicated  cases  when  more  than 
3  errors  occur  we  show  how  to  construct  a  coset  leader  in  any 
coset.  In  addition  we  proved  the  following  results. 

Theorem  2  (i)  For  any  coset  with  syndrome  S  =  (0,  A  + 
2H,2<7),  there  exists  a  coset  leader  of  Lee  weight  <  6. 

(ii)  For  any  coset  with  syndrome  S  =  (£,  A  A-  2B,  2 C)  where 
t  =  1  or  t  =  3,  there  exists  a  coset  leader  of  weight  <  5. 

(iii)  Let  m  >  5,  then  for  any  coset  with  syndrome  S  = 
(2,  A  +  2B,  2 C)  there  exists  a  coset  leader  of  weight  <  4. 

Theorem  3  Let  Dt  denote  the  number  of  cosets  with  a 
coset  leader  of  weight  i  in  the  Z4-linear  Goethals  code. 


(i)  If  m  >  5  then  Do 

2m  +  l  \ 

3 

2»n  +  l\  /2?7i  +  l\ 

.  1  )  ~  \  3  )  ~  V"  3 

(ii)  If  m  —  3  then  Dq  =  1,  Di  =  16,  D2  =  120,  £>3  =  480, 


i,  a  =  m,  *  =  m 

D3  =  fV1),  D,  =  23m+1  -  1  -  (2m2+1)  -  (2m  -  1)*=±4 
Di  =  23m+1  -  (2”‘]+1)  -  (2”+1)  and  D6  =  (2m  -  1)1 


l±± 


Da  =  823,  D5  =  528  and  D6  =  80. 
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Abstract  —  We  show  that  the  problems  solved  by  the 
Berlekamp-Massey  and  Welch-Berlekamp  algorithms 
are  special  instances  of  a  more  general  problem  which 
has  been  studied  (in  the  characteristic  zero  case)  by 
control  theorists.  We  present  an  algorithm  to  solve 
this  general  problem  which  can  be  used  to  find  the 
solutions  to  both  the  classical  Key  Equation  and  the 
Welch-Berlekamp  interpolation  problem. 

Summary 

Classically,  the  decoding  of  a  Reed-Solomon  code  is  carried 
out  by  calculating  power  sum  syndromes  and  then  using  the 
Berlekamp-Massey  algorithm  to  solve  the  resulting  linear  re¬ 
currence  problem  [2,  4].  A  new  approach,  taken  by  Welch 
and  Berlekamp  [5],  is  to  convert  the  decoding  problem  into 
a  rational  interpolation  problem  which  can  then  be  solved  by 
the  Welch-Berlekamp  algorithm.  One  of  the  advantages  of 
this  second  approach  is  that  the  syndromes  do  not  have  to  be 
calculated,  thus  saving  decoder  computations. 

Both  the  Berlekamp-Massey  and  Welch-Berlekamp  algo¬ 
rithms  can  be  thought  of  as  solving  special  instances  of  the 
following  problem. 

The  Problem:  Let  F  be  a  field.  If  f(X)  —  P(X)/Q(X ) 
is  a  rational  function  of  two  polynomials  with  coefficients 
in  F,  we  define  the  complexity  A (/)  of  /  to  be  the  integer 
max{(degP(X))  +  1,  degQ(X)}.  Let  x$,  Xi ,  . .  . ,  Xm- 1  G  F  be 
distinct.  For  each  i  G  {0,  1,  .  .  . ,  m  —  1},  let  U  be  a  nonnega¬ 
tive  integer  and  let  yi,o ,  Vi,i  5  •  •  •  >  yi,h  £  F.  We  say  that  the 
function  f{X )  :=  P{X)  j  Q[X)  is  a  generalised  rational  inter¬ 
polation  if,  for  all  i  G  {0,1,... ,  m  —  1},  the  formal  power  series 
of  f  at  Xi  is  defined  and  is  of  the  form 

h 

yij(X  -  Xi)3  -I-  higher  terms. 

j  =  o 

The  generalised  rational  interpolation  problem  asks  for 
the  generalised  rational  interpolation  /  of  lowest  complexity 

Hf)- 

Thus  the  generalised  rational  interpolation  problem  asks  for 
the  ‘smallest’  rational  function  which  has  specified  low  order 
terms  in  its  power  series  expansion  at  certain  points.  The 
Welch-Berlekamp  interpolation  problem  is  the  special  case  of 
this  problem  when  U  =  0  for  all  i  (since  y»|0  is  simply  the 
value  of  /  at  x,).  The  problem  solved  by  the  Berlekamp- 
Massey  algorithm  can  be  thought  of  as  the  case  when  m  =  1 
and  Xo  =  0. 

When  F  has  characteristic  zero,  there  is  a  close  relation¬ 
ship  between  formal  power  series  and  Taylor  series:  We  may 
regard  the  generalised  rational  interpolation  problem  as  ask¬ 
ing  for  the  lowest  complexity  rational  function  with  specified 
low  order  derivatives  at  certain  points.  This  is  a  problem  in 
control  theory  studied  by  Antoulas  and  Anderson  [1].  So  we 


can  regard  the  problem  above  as  generalising  their  problem  to 
fields  of  arbitrary  characteristic. 

We  present  a  new  algorithm  (a  close  analogue  of  the 
‘Welch-Berlekamp’  algorithm  of  Chambers  et  al  [3])  which 
solves  the  generalised  rational  interpolation  problem.  Like 
the  Berlekamp-Massey  algorithm,  the  data  can  be  fed  into 
our  algorithm  serially.  The  algorithm  uses  0(n2)  field  opera¬ 
tions,  where  n  =  -0*  The  algorithm  can  be  used 

in  place  of  the  Berlekamp-Massey  or  Welch-Berlekamp  algo¬ 
rithms,  since  both  problems  are  special  cases  of  the  generalised 
rational  interpolation  problem. 
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Abstract  —  We  consider  the  optimal  strategy  for 
erasing  symbols  in  concatenated  coding  schemes.  This 
erasing  strategy  uses  a  posteriori  likelihoods  of  RS  sym¬ 
bols  to  determine  erasures  which  maximize  the  prob¬ 
ability  of  decoding  correctly.  Some  properties  of  per¬ 
formance  of  this  strategy  are  presented.  Erasing  rules 
for  decoding  the  same  received  word  more  than  once 
are  also  examined. 

I.  Introduction 

Various  kinds  of  erasing  strategies  [l]  have  been  explored 
by  many  researchers  since  Forney  first  presented  concatenated 
coding  schemes.  We  are  interested  in  the  erasing  strategy 
which  maximizes  the  probability  of  decoding  an  outer  word 
correctly  given  a  posteriori  likelihoods  of  all  RS  symbols.  The 
first  stage  of  the  optimal  erasing  rule  [2]  is  to  erase  the  most 
unreliable  symbol  if  D,  the  minimum  distance  of  the  RS  code, 
is  even  and  erase  nothing  if  D  is  odd.  Then  symbols  should  be 
erased  in  pairs  in  order  of  ascending  reliability,  thus  keeping 
the  difference  between  D  and  the  number  of  erasures  odd. 
Let  pi  ,P2,  *  •  •  ,Pn  be  the  error  probabilities  of  the  symbols 
provided  by  the  inner  decoder  in  a  RS  word,  and  Pi  <  P2  < 

■  *  ■  <  pjv.  Ph(pi)  denotes  the  probability  that  h  errors  occur 
in  the  i  most  reliable  symbols.  Given  that  the  most  unreliable 
e  symbols  have  been  erased  in  a  RS  word,  erasing  the  next 
two  symbols,  with  error  probabilities  py_e  and  p^-e-i,  can 
increase  the  probability  of  decoding  correctly  if  and  only  if 

>  W±l»±l  (1) 

ph{  Pj)  Pj  +  lPj+2 

where  j  =  N  -  e  -  2,  h  —  {D  ~  e  -  l)/2,  and  qx  =  1  -  p{  for 
l<i<N. 

However  if  a  decoder  erases  symbols  in  pairs  until  (1)  fails 
to  hold,  the  resulting  probability  of  decoding  correctly  is  not 
necessarily  the  maximum  obtainable.  Here  we  present  differ¬ 
ent  approaches  to  simplify  the  search  for  the  optimal  number 
of  erasures  by  exploring  bounds  on  Ph-i(pj)/Ph(Pj)- 

II.  The  Optimal  Number  of  Erasures 

This  first  problem  encountered  is  how  to  evaluate  Ph(pj ). 
Although  it  can  be  calculated  exactly,  an  easily-calculated  es¬ 
timate  is  preferred.  Barbour  [3]  derived  an  asymptotic  expan¬ 
sion  for  Ph(pj).  This  allows  us  to  have  an  approximation  to 
the  left  hand  side  of  (1)  and  a  bound  on  the  error  of  approx¬ 
imation  .  However  as  more  accuracy  is  required,  more  terms 
in  the  expansion  should  be  included  in  the  approximation  and 
the  complexity  increases  dramatically. 

The  RHS  of  (1)  decreases  with  j  while  the  LHS  of  (1)  is 
not  necessarily  increasing  with  j.  If  the  LHS  of  (1)  is  in¬ 
creasing  with  j,  apparently  the  probability  of  decoding  cor¬ 
rectly  has  only  one  local  maximum  with  respect  to  numbers 
of  erasures.  We  show  that  the  LHS  of  (l)  increases  with  j  if 
Pn-d+i  =  pn-d+ 2  =  •  •  •  =  pN-  Given  that  there  are  two  dif¬ 
ferent  symbol  error  probabilities,  piow  and  pnigh ,  among  all  N 


symbols,  we  also  show  that  the  LHS  of  (1)  increases  with  j  if 
Plow  and  pnigh  satisfy  an  inequality.  Basically  this  inequality 
gives  an  upper  bound  on  phigh  in  terms  of  piow- 

The  derivative  of  Ph~i(pj)/Ph(pj)  with  respect  to  pt  is 
always  non-negative,  1  <  i  <  j .  This  observation  enables 
us  to  find  several  upper  and  lower  bounds  on  the  optimal 
number  of  erasures.  An  upper  bound  on  Ph-i (Pj)/Ph(pj)  is 
j-h+i  •  Since  this  bound  increases  with  j ,  given  a  fixed  in¬ 
teger  m  £  S  =  {N-D  + 1,  N-D  +  3,  •  •  • ,  A -2},  the  optimal 
number  of  erasures  of  the  RS  word  is  not  more  than  N  —  m  —  2 
if  •  Note  that  if  the  inequality  holds  for 

some  m,  changing  p2r  "> Pm  arbitrarily  can  not  increase  the 
optimal  number  of  erasures  to  more  than  N  —  m  —  2.  A  draw¬ 
back  of  this  bound  is  that  it  depends  solely  on  pi  and  becomes 
very  loose  when  p\  is  small  compared  to  other  pi  s.  Two  more 
upper  bounds  can  be  obtained  to  fix  this  problem.  One  bound 
is  based  on  Pj-h+  x,  *  *  ■  >Pj  instead  of  p\ .  The  other  bound  is 
based  on  pi  and  pa,  the  average  of  p\,  •  •  ■  }pj.  Examples  show 
that  applying  these  three  upper  bounds  of  Ph-i(Pj)/ Ph(pj) 
often  gives  very  tight  upper  bound  on  the  optimal  number  of 
erasures.  Similar  approaches  can  be  used  to  find  lower  bounds 
also.  Experiments  show  that  upper  and  lower  bounds  meet 
very  often. 

III.  Results  for  Multiple  Decodings 
Assume  that  the  first  decoding  uses  the  optimal  erasing 
strategy  described  above.  If  the  first  decoding  fails  to  de¬ 
code,  we  discuss  the  best  erasing  rule  that  the  second  decod¬ 
ing  should  use.  Here  we  show  that  the  probability  of  decoding 
correctly  when  e  —  i  symbols  are  erased  is  always  larger  than 
that  when  e  -j-  i  symbols  are  erased,  where  e  is  the  number 
of  erasures  of  the  first  decoding  and  all  pAs  are  less  than  one 
half.  If  the  second  decoding  erases  less  symbols  than  the  first 
decoding  and  all  symbols  erased  by  the  first  decoding  have 
the  same  error  probability,  we  show  that  the  optimal  number 
of  erasures  for  the  second  decoding  is  either  one  or  zero  no 
matter  what  e  is.  Erasing  rules  for  decoding  more  than  twice 
are  also  discussed. 
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Abstract  -  A  new  error-and-erasure  decoding  procedure  that 
decodes  cyclic  codes  up  to  the  actual  minimum  distance  is 
presented.  This  procedure  annihilates  erasure  effects  from  a 
syndrome  matrices  and  produces  modified  syndrome  matri¬ 
ces  that  can  be  used  to  obtain  error  locations  with  an  error- 
only  decoding  algorithm. 

L  Introduction 

This  paper  presents  a  new  error-and-erasure  decoding  procedure 
that  produces  erasure  masking  matrices  to  annihilate  erasure  effects 
from  original  syndrome  matrices. 

In  general,  an  error-and-erasure  decoding  procedure  is  based  on 
error-only  decoding  algorithms.  In  [2],  Forney,  based  on  Peterson, 
Gorenstein  and  Zierler’s  earlier  work  [8],  introduced  an  error-and- 
erasure  decoding  procedure  that  can  decode  up  to  the  BCH  bound. 
Later,  in  [6],  Shahri  and  Tzeng  developed  an  error-and-erasure 
decoding  algorithm  to  decode  cyclic  codes  up  to  the  HT  bound.  The 
procedure  in  [6]  uses  Feng  and  Tzeng’s  algorithm  for  Multi¬ 
sequence  Shift-Register  Synthesis  [12].  Then,  an  error-and-erasure 
decoding  procedure  up  to  special  cases  of  the  Roos  bound  was 
given  by  Shahri,  Tzeng  and  Jensen[10]. 

Recently,  Feng  and  Tzeng  introduced  algorithms  for  error-only 
decoding  of  cyclic  codes  up  to  the  actual  minimum  distance [1,9]. 
The  algorithm  in  [1]  uses  the  nonrecurrent  syndrome  dependence 
relations  among  the  known  syndromes.  In  [9],  they  determined  the 
unknown  syndromes  by  employing  a  (2t+l)  x  (2t+l)  syndrome 
matrix  and  majority  voting  method.  The  error-and-erasure  decoding 
procedure  presented  in  this  paper  is  based  on  Feng  and  Tzeng’s 
recent  work  [1,9]. 

II.  Decoding  Procedure 

The  procedure  presented  in  this  paper  generates  erasure  masking 
matrices  which  annihilate  all  erasure  effects  in  a  syndrome  matrix. 
Thus,  it  converts  an  error-and-erasure  decoding  problem  to  an 
error-only  decoding  problem.  Furthermore,  since  it  produces  modi¬ 
fied  syndrome  matrices  which  are  homomorphic  images  of  the  orig¬ 
inal  syndrome  matrices,  error-only  decoding  algorithms,  ex.  Feng 
and  Tzeng’s  algorithms  [1,9],  can  be  applied. 

A  brief  description  of  our  decoding  procedure  is  given  below: 

Step  1.  Construct  a  syndrome  matrix  S  just  as  for  any  error-only 
decoding  case. 

Step  2.  Partition  the  p  erasure  locations  into  two  arbitrary  groups, 

say  Gj  and  G?,  where  Gj  =  (a\a2 . ak)  =  (fd  IV-  Fk) 

and  G2  =  (ak+1,ak+2....ap)  =  (Fk+l>  Fk+2’***»  Fp),  then  Fi 
are  the  erasure  locations  and  k=|_(p  +  l)/2j 
Step  3.  From  Gj  and  G2,  construct  erasure  masking  matrices,  |X  and 
A  such  that  \i  masks  erasures  in  Glt  and  A  masks  erasures  in 

G2 

Step  4.  Compute  a  modified  syndrome,  U  =  (ISA=  |XEA+|dFA, 
where  E  is  the  error  portion  of  a  syndrome  matrix  and  F  is 
the  erasure  portion  of  a  syndrome  matrix.  Since  |LL  and  A 
matrices  mask  all  erasures,  |iFA  =  0  and  U  =  (XSA=  (UlEA. 
Step  5. Use  an  error-only  decoding  algorithm  to  find  an  modified 
error  locator  polynomial  y,  such  that  Uy  =  0. 

1This  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  Grant  NCR-9016095  and  9406043. 


step  6.  Obtain  the  coefficients  for  an  error  locating  polynomial 
f(z)  from  Ay=  f . 

Step  7.  Use  the  Chien  search  to  fine  the  roots  of  f(z). 

If  the  number  of  nth  root  of  unity  roots  in  f(z)  is  less  than 
(d  -l)/2,  then  all  the  error  locations  are  found. 

If  not,  go  to  step  8. 

Step  8.  Compute  modified  unknown  syndromes  using  error-only 
decoding  algorithms  presented  in  [1]  or  [9].  Then,  find  the 
values  of  unknown  syndromes  from  the  computed  modified 
unknown  syndromes. 

Step  9.  If  all  unknown  syndromes  can  be  found,  obtain  a  codeword 
by  means  of  Inverse  Fourier  Transformation . 

Step  4  yields  a  modified  syndrome  matrix  U  which  is  a  homomor¬ 
phic  image  of  the  syndrome  matrix  in  step  1.  Thus,  error-only 
decoding  algorithm  in  step  1  can  be  applied  to  matrix  U  to  solve  for 
error  locations. 

In  summary,  we  developed  an  efficient  systematic  error-and-erasure 
decoding  procedure  using  erasure  masking  matrices  ft  and  A  that 
can  be  applied  to  any  type  of  syndrome  matrix.  Therefore,  our  pro¬ 
cedure  can  be  used  with  any  error-only  decoding  algorithm  as  long 
as  it  uses  a  syndrome  matrix. 
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Abstract  —  A  common  way  to  deal  with  bursts  in 
data  storage  systems  is  to  interleave  byte-error  cor¬ 
recting  codes.  In  the  decoding  of  each  of  these  byte- 
error  correcting  codes,  one  normally  does  not  make 
use  of  the  information  obtained  from  the  previous  or 
subsequent  code,  while  the  bursty  character  of  the 
channel  indicates  a  dependency. 

In  [1]  such  a  dependency  is  exploited  to  enhance  the 
decoding  performance.  Here  a  different,  but  similar 
approach  is  proposed. 

I.  Introduction 

In  order  to  correct  burst  errors  in  a  data  storage  channel,  the 
most  common  procedure  is  to  interleave  an  error-correcting 
code  to  a  certain  depth.  This  error  correcting  code  is  normally 
a  byte-error  correcting  code,  like  a  Reed-Solomon  code.  The 
depth  of  interleaving  determines  the  burst  correcting  power  of 
the  interleaved  scheme.  In  this  way,  the  bursts  are  “random¬ 
ized”  into  different  codewords.  Each  codeword  sees  a  random 
error  event. 

Although  interleaving  is  an  efficient  approach,  it  throws 
away  information,  since  it  ignores  the  fact  that  in  a  bursty 
channel  errors  are  usually  correlated.  Ways  of  exploiting  this 
correlation  in  order  to  forecast  errors  were  studied  in  the  lit¬ 
erature  [l]  with  the  introduction  of  the  so  called  “helical” 
interleavers.  Here,  we  introduce  a  different,  but  somewhat 
similar  technique.  Even  when  the  error-correcting  capability 
of  the  error  correcting  code  has  been  exceeded,  one  can  still, 
by  making  use  of  these  methods,  retrieve  the  data  in  many 
cases. 

II.  A  GENERAL  DESCRIPTION 

In  its  most  basic  form,  the  procedure  works  as  follows:  the 
decoder  decodes  normally  using  the  interleaved  scheme.  How¬ 
ever,  if  a  codeword  is  uncorrectable  due  to  too  many  errors, 
it  is  flagged.  Then,  an  attempt  to  decode  it  again  using  the 
previous  and/or  following  codeword  is  made.  To  this  end,  the 
decoder  declares  erasures  in  the  locations  corresponding  to  er¬ 
rors  in  the  previous  and/or  following  codeword.  If  errors  have 
occurred  in  bursts,  it  is  likely  that  the  decoding  power  will 
be  enhanced,  since  a  code  can  correct  roughly  twice  as  many 
erasures  as  errors.  We  will  present  several  variations  of  this 
strategy,  that  trade  reliability  with  decoding  power.  We  will 
show  how  to  adapt  the  method  to  channels  that  suffer  from 
bursts  as  well  as  from  random  errors  at  the  same  time.  We  will 
also  introduce  a  toroidal  interleaving  method  that  eliminates 
the  lack  of  symmetry  between  the  first  and  the  last  codeword 
in  a  regular  interleaving  scheme. 

The  toroidal  scheme  works  as  follows:  if  A  is  the  depth  of 
interleaving  and  n  is  the  length  of  a  codeword,  such  that  A  and 
n  are  relatively  prime,  then  symbol  al);  is  followed  by  symbol 
a*+i,j  +  i>  where  i  -(-  1  is  taken  modulo  A  and  j  -f  1  is  taken 


modulo  n  (in  normal  interleaving,  symbol  a,j  is  followed  by 
symbol  at+i,j  when  0  <  i  <  A  —  2  and  symbol  o.a-i.j  is  followed 
by  symbol  a0lJ  +  i). 

III.  An  example 

Below  is  a  simple  example  of  the  enhanced  interleaved 
scheme  when  A  =  5  and  n  —  11.  Assume  that  each  row 
implements  a  code  with  minimum  distance  d  —  4,  therefore  it 
can  correct  one  error  and  detect  two,  as  well  as  an  error  and 
an  erasure.  Also,  assume  the  toroidal  interleaving  described 
above. 


The  x’s  represent  errors  in  the  corresponding  symbols.  As 
we  can  see,  a  burst  of  length  4  and  a  random  error  (in  row 
2)  have  occurred.  The  decoder  detects  an  uncorrectable  error 
pattern  in  row  2,  so  it  flags  that  row.  By  examining  row  1,  it 
finds  that  entry  (1,2)  is  in  error,  and  similarly,  by  examining 
row  3,  it  finds  that  entry  (3,4)  is  in  error.  Therefore,  the 
decoder  will  predict  that  there  was  an  error  in  entry  (2,3), 
so  it  will  declare  an  erasure  there.  Now,  row  2  has  an  error 
and  an  erasure,  which  is  within  its  error-correcting  capability. 
Finally,  the  decoder  corrects  row  2.  Notice  that  this  was  not 
possible  with  the  traditional  scheme. 

Details  of  the  implementation  can  be  found  in  [2]. 
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Abstract  —  A  decoding  algorithm  for  linear  codes 
over  Z\  —  {0, 1,2,3},  the  ring  of  integers  modulo  4,  is 
given  which  gives  the  codewords  that  is  closest  to  the 
received  vector  in  Lee  distance. 


I.  Introduction 

A  linear  code  over  Z4  is  same  as  a  group  code  over  the  4- 
element  cyclic  group  and  can  be  defined  by  a  check-matrix  [1]. 
The  algorithm  proposed  is  similar  to  the  one  given  in  [2]  for 
soft-decision  decoding  of  binary  linear  codes.  First  a  trellis  is 
cons  tructed  using  the  check  matrix  of  the  linear  code  over  Z4 
under  consideration  using  Wolf’s  trellis  const  ruction  [3].  There 
is  one  to  one  correspondence  between  the  set  of  paths  from 
the  start  node  to  the  goal  node  and  the  set  of  codewords. 
Hence,  the  pr  oblem  of  decoding  is  same  as  finding  the  path 
in  the  trellis  which  is  closest  to  the  received  vector  in  Lee 
distance.  The  search  is  guided  by  an  evaluating  function  /  = 
g  +  h  defined  on  each  node,  where  g  depends  only  on  the  past 
and  h (called  heuristic  func  tion)  is  an  estimate  on  the  set  of 
possible  futures.  The  nodes  with  minimum  value  of  f  is  given 
the  first  priority  for  expanding.  The  most  important  factor  in 
the  efficeincy  of  the  algorithm  depends  on  the  complexity  of 
the  heuristic  function.  We  define  a  heuristic  function  which 
can  be  easily  computed  with  the  worst  case  complexity  of  4n 
searches  over  Lee  weight  distribution  ,  where  n  is  the  length 
of  the  code. 


II.  Heuristic  function  ’ft’  and  cost  function  7’ 

Let  r  =  (ro,  n, . . .  ,rn_  1)  be  the  received  vector.  A  cost  func¬ 
tion  /(m,  t)  for  any  node  m  at  level  t,  (0  <  t  <  n  —  1)  is 
defined  by 

/(m,<)  =  g(m,t)  +  h(m,t)  (1) 

where  g(m,t)  and  h(m,t)  are  defined  as  follows 

t 

g(m,t)  =  '£/LW(n-ci)  (2) 

i= 0 


where  LW (x)  =  Lee  weight  of  x  and  cp(t)  =  (co,  ci, . . . ,  ct)  is 
the  path  leading  to  that  node  and 


h(m,  t) 


min 

*ex(t) 


Y  LW(rt 


Ki=t+ 1 


(3) 


where 

X(t)  =  {z  =  (co,Ci,.  .  .  .  .  ,«n-l)/®i+lj  ■  ^ 

Z4,LW(x)  G  Ls}  and  Ls  is  the  set  of  all  Lee  weights  of  the 
codewords.  The  decoding  algorithm  given  below  gives  the 
codeword  which  is  closest  to  the  received  vector  in  Lee  dis¬ 
tance. 


III.  The  Decoding  Algorithm 

Step  1:  Create  a  list  called  OPEN  and  let  start  node  be  the 
only  element  in  OPEN. 

Step  2:  Select  and  remove  the  first  node  from  OPEN  and  call 
it  node  m.  If  m  is  the  goal  node  exit  successfully,  and 
the  path  history  of  node  m  is  the  output  of  the  decoder. 


Step  3:  Expand  node  m,  generating  next  level  nodes  which 
are  successors  of  the  node  m.  This  expanding  operation 
consists  of 

(a)  Obtaining  all  successor  nodes  and  computing  g  and 

h  values  of  all  the  successor  nodes. 

(b)  For  each  of  the  successor  storing  the  path  followed 

so  far  (called  path  history)  from  the  start  node. 

(c)  Storing  all  successors  in  OPEN.  em[(d)]  Arranging 

the  nodes  in  OPEN  in  the  increasing  order  of  their 
f  value.  (For  nodes  with  equal  value  of  f  arrange 
them  in  the  decreasing  order  of  the  levels  of  the 
nodes.  For  nodes  with  equal  values  of  f  and  in  the 
same  level  arrange  in  the  increasin  g  order  of  their 
g' value.) 

Step  4:  Go  to  Step  2. 

IV.  A  SIMPLE  PROCEDURE  TO  CALCULATE  h(m,t) 
The  following  theorem  leads  to  a  simple  procedure  which  gives 
the  value  of  h(m,t)  without  actually  carrying  out  the  mini¬ 
mization. 

Theorem  1  For  a  chosen  node  m,  which  is  say  at  level  t, 
let  Ci_  —  (co,ci,...,ct),  and  lc  =  LW(ct).  Also  let  n  = 
(rt+ i,...,rn_i),  and  lr  =  LW(n).  Then,  h(m,t)  =  \h*\, 
(absolute  value  of  h* )  wh  ere  h*  is  the  least  integer  such  that 
lc  lr  i  h  G  Ls 

For  any  node  m,  the  possible  values  for  h(m,t)  are 
0,1,2,. . .  ,2(n-t-l).  From  Theorem  1,  it  follows  that  one  can 
find  h(m,t)  for  each  node,  by  successively  assuming  values 
from  0  to  2(n-t-l)  and  matching  with  elements  of  Ls  to  check 
whether  lc  +  lT  pmh(m,t)  G  Ls  and  stopping  at  the  first  value 
for  which  lc  +  lr  i  h.(m,f)  G  Ls  Clearly,  the  worst  case  for 
matching  effort  is  for  the  start  node  for  which  the  number  of 
matching  efforts  may  be  2n.  The  complexity  of  each  search 
for  matching  depe  nds  on  the  Lee  weight  distribution  of  the 
code.  If  the  code  has  codewords  of  specific  weights  only  then 
the  search  becomes  simple.  For  instance,  for  constant  Lee 
weight  codes  the  search  is  to  test  for  that  constant  weight  or 
zero.  If  minimum  Lee  weight  is  known  then  one  checks  only 
for  zero  and  all  weights  starting  from  minimum  Lee  weight  to 
four  times  the  length  of  the  code.  In  the  absence  of  any  knowl¬ 
edge  of  Lee  weight  distribution  one  is  compelled  to  check  for 
all  weights  from  zero  to  twice  the  length  of  the  code. 
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Abstract  —  We  present  a  new  soft-decision  decoding 
algorithm,  Modified  A*  (MA*),  that  conducts  heuristic 
search  through  a  code  tree  for  a  binary  ( n ,  k)  linear 
code.  MA*  improves  on  the  results  obtained  earlier 
using  Algorithm  A*.  We  also  describe  the  applica¬ 
tion  of  the  simulated  annealing  (SA)  algorithm  to  the 
decoding  problem,  transformed  into  a  continuous  op¬ 
timization  problem. 

Summary 

In  MA*,  search  is  guided  by  an  evaluation  function  /  defined 
to  take  advantage  of  the  information  provided  by  the  received 
vector  and  the  inherent  properties  of  the  transmitted  code. 
The  algorithm  maintains  a  list  £  of  nodes  of  the  code  tree 
that  are  candidates  to  be  expanded.  The  algorithm  selects  for 
expansion  the  node  in  £  with  minimum  values  of  function  /. 
If  it  selects  a  goal  node  for  expansion,  it  has  found  an  “opti¬ 
mal”  path  from  the  start  node  to  the  goal  node  whose  labels 
correspond  to  a  codeword  that  minimizes  the  error  probability 
when  we  assume  all  codewords  have  equal  probability  of  being 
transmitted.  For  every  node  m  of  the  code  tree  visited  by  the 
algorithm,  MA*  keeps  two  values,  f(m)  and  lowi(m),  where 
/  is  a  fixed  non-negative  integer,  whereas  A*  keeps  only  one 
value  f(m)  [1];  lowi(m)  is  a  new  lower  bound  on  the  cost  of 
an  optimal  path  that  goes  through  node  m.  This  algorithm 
keeps  an  upper  bound,  UB,  on  the  value  of  lowi  for  every  node 
in  an  optimal  path.  If  the  value  of  lowi  for  a  node  is  larger 
than  or  equal  to  UB,  no  further  search  through  this  node  is 
necessary  and  the  node  can  be  discarded. 

If  no  restriction  is  placed  on  the  size  of  list  £,  then  the  MA* 
decoding  algorithm  is  an  maximum-likelihood  soft-decision 
(MLSD)  decoding  algorithm.  In  our  sub-optimal  soft-decision 
(SOSD)  decoding  algorithm,  we  limit  the  size  of  list  £  accord¬ 
ing  to  the  following  criterion.  If  a  node  m  needs  to  be  stored  in 
list  £  when  the  size  of  list  £  has  reached  a  given  upper  bound 
Mb,  then  we  discard  the  node  with  larger  /  value  between 
node  m  and  the  node  in  list  £  with  the  maximum  /  value. 

To  verify  the  performance  of  our  SOSD  decoding  algorithm, 
we  show  simulation  results  for  the  (104,  52)  code  and  for  the 
(256, 131)  code  when  these  codes  are  transmitted  over  AWGN 
channels,  with  Mb  —  1000  and  l  =  4.  From  Figure  1,  for 
the  (104,  52)  code  the  performance  of  our  SOSD  decoding  al¬ 
gorithm  is  within  0.15  dB  of  the  lower  bound  of  the  perfor¬ 
mance  of  the  MLSD  decoding  algorithm.  Thus,  for  the  sam¬ 
ples  tried,  limiting  Mb  to  1000  introduced  only  a  small  degra¬ 
dation  on  the  performance  of  the  algorithm.  In  Table  1,  for  the 
(256,  131)  code  the  results  were  obtained  by  simulating  35,000 
samples.  No  decoding  error  occurred  during  simulation.  For 
the  examples  tried,  the  average  number  of  codewords  con¬ 
structed  is  insignificant  compared  with  the  total  number  of 
codewords.  In  Table  1,  N(r)  =  number  of  nodes  visited,  C(r) 
=  number  of  codewords  constructed,  M(r)  =  number  of  nodes 

1This  work  was  partially  supported  by  the  NSF  under  Grant 
NCR-9205422.  C.  R.  Wulff  was  supported  by  a  Research  Experience 
for  Undergraduates  Supplement  of  Grant  NCR-9205422. 


stored  in  list  £,  max  =  maximum  value  among  samples  tried, 
ave  —  average  value  among  samples  tried,  and  7*,  =  SNR  per 
transmitted  information  bit. 


Figure  1:  Performance  of  the  MA*  SOSD  decoding  algorithm  for 
the  (104,52)  binary  extended  quadratic  residue  code 
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Table  1:  Performance  of  the  MA*  SOSD  decoding  algorithm  for 
the  (256,131)  binary  extended  BCH  code 

When  the  decoding  problem  is  transformed  into  a  contin¬ 
uous  optimization  problem  [2],  it  becomes  finding  a  k  dimen¬ 
sional  real  vector  that  minimizes  the  cost  function  g.  SA,  a 
technique  that  statistically  guarantees  finding  global  optima 
for  optimization  problems,  could  be  applied  to  solve  this  prob¬ 
lem.  SA  uses  a  control  parameter  called  temperature  (T), 
which  is  initially  high  and  decreased  steadily.  At  each  tem¬ 
perature,  a  large  number  of  possible  “moves”  are  generated, 
evaluated,  and  possibly  accepted.  Each  move  effects  a  small 
change  in  the  current  “configuration”  (a  real  vector),  and  may 
be  obtained  by  perturbing  one  component  of  the  current  vec¬ 
tor  by  a  small  quantity.  This  move  is  accepted  if  it  decreases 

_ 

cost  g.  Also,  this  move  is  accepted  with  probability  e  T 
even  if  the  move  results  in  an  increase  of  A g  in  g.  This  pro¬ 
vides  a  mechanism  to  escape  from  local  (non-global)  optima, 
with  higher  probability  at  higher  temperatures.  There  is  a 
high  likelihood  that  the  system  state  moves  to  the  region  of 
the  global  optimum  before  the  temperature  becomes  too  low. 
When  T  «  0,  the  algorithm  settles  into  the  current  local  op¬ 
timum.  Simulation  results  for  the  SA  decoding  algorithm  will 
be  presented. 
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Abstract  —  Maximum  likelihood  decoding  (MLD)  of 
binary  linear  block  codes  is  addressed  by  combining 
the  approaches  of  processing  the  generator  matrix  G 
and  parity-check  matrix  //. 

I.  MLD  BASED  ON  ORDERING  IN  THE  DUAL  SPACE 

Consider  a  binary  linear  (N ,  K ,  du)  code  with  generator 
matrix  G  and  check  matrix  H .  For  a  given  received  sequence, 
let  Bk  be  the  most  reliable  basis  (MRB)  [2],  [3],  [4]  for  the 
column  space  cs(G)  —  GF(2/v)  of  G  and  let  be  the 

least  reliable  basis  (LRB)  [l]  for  cs  (//)  =  GF(2J?V  K )  of  if, 
consisting  of  the  columns  of  the  respective  matrices. 
Theorem  1:  The  complement  of  the  location  set  of  Bk  is 
the  location  set  of  Qk-k-  E 

An  efficient  way  to  perform  (nearly)  MLD  starts  with  form¬ 
ing  Co,  the  codeword  that  agrees  with  bit-by-bit  hard  detection 
at  the  positions  of  the  MRB.  Thereafter,  search  procedures  [2], 

[3],  [4]  examine  alternatives  to  Co-  In  [2],  the  alternatives  to  Co 
are  considered  in  successive  stages.  At  each  order  i  of  repro¬ 
cessing,  codewords  are  processed.  A  resource  test  tightly 
related  to  the  reprocessing  strategy  reduces  the  number  of 
computations  at  each  decoding  stage.  A  similar  approach  [4] 
utilizes  a  partial  ordering  of  the  information  vectors.  Syn¬ 
drome  decoding  [5]  is  an  alternative  approach  to  accomplish 

MLD. 

By  Theorem  1,  the  resource  tests  can  be  related  to  the 
LRB.  Also,  those  syndrome  decoding  aspects  that  are  based 
on  the  LRB  may  conveniently  be  incorporated  into  the  decod¬ 
ing  procedure. 

II.  Syndrome  decoding  aspects 

Let  yi  6  Qn~k\  i  =  1, 2,  •  •  • ,  N  —  A'  be  indexed  in  nonde¬ 
creasing  order  of  reliability.  Let  s  —  Ht cq  be  the  syndrome 
corresponding  to  Co-  Assuming  s  ^  0,  expand  s  in  terms  of 
the  LRB,  i.e.,  s  =  Vpj  w^iere  V\  >  1>2  >  •  ■  •  >  Vw 

Setting  H  =  [A  In-k],  with  the  N  —  K  rightmost  positions 
corresponding  to  the  LRB,  w  is  the  Hamming  weight  of  s. 

By  [1],  if  either  a)  w  =  1  or  b)  p\  +  w  <  (Ih  then  Co  is 
the  most  likely  codeword.  A  stopping  rule  stronger  than  b) 
now  follows. 

Theorem  2:  If  w  >  2  and 

max  {p*  +  2(/  —  1)  -f  1}  <  rf//,  (1) 

l€[  2,ti/,] 

then  order-0  reprocessing  is  optimum.  □ 

Generalization  of  Theorem  2  to  higher  orders  1  of  repro¬ 
cessings  is  also  presented.  By  such  extension,  we  associate  to 
either  each  s  or  the  most  likely  syndromes  a  set  of  columns 
of  H  to  be  searched,  as  in  [1].  We  provide  an  efficient  algo¬ 
rithm  to  preprocess  the  corresponding  table  look-up.  The  size 
of  this  table  can  be  limited  to  the  most  likely  error  patterns 

1  Supported  in  part  by  NSF  Grant  NCR-94-15374. 

2  Supported  in  part  by  the  Israel  Science  Foundation  adminis¬ 

tered  by  the  Israel  Academy  of  Sciences  and  Humanities. 


using  the  statistical  approach  of  [2].  Finally,  we  present  an  al¬ 
gorithm  which  iteratively  evaluates  the  syndrome  s  each  time 
a  dimension  is  added  when  constructing  the  LRB.  With  this 
algorithm,  the  most  likely  error  patterns  are  tested  without 
completing  the  construction  of  the  LRB. 

Syndrome-based  tests  stop  the  search  more  effectively  for 
some  received  words  (typically  when  the  signal  to  noise  ra¬ 
tio  (SNR)  is  low).  However,  most  of  the  syndrome  tests  are 
code-dedicated,  whereas  resource  tests  are  more  universal. 

III.  Simulation  results 

For  extended  Hamming  codes  of  length  2m,  m  <  7,  with 
order- 1  reprocessing  and  table  look-up,  the  maximum  number 
of  computations  Ntot  is  compared  in  Table  1  with  the  worst 
case  results  of  both  [2]  and  [1]  (the  latter  is  MLD).  We  also 
indicate  the  partial  ordering  maximum  cost  Nord •  The  aver¬ 
age  number  of  computations  Nave  rapidly  converges  to  Nord 
as  the  SNR  increases.  For  the  (24,12,8)  Golay  code  our  de¬ 
coding  method  requires  on  average  50  and  15  real  operations 
to  achieve  practically  optimum  error  performance  at  the  re¬ 
spective  BER  10-3  and  10 “6. 

Finally,  a  new  reprocessing  algorithm  is  analyzed.  After  the 
ordering  has  been  completed,  this  algorithm  no  longer  requires 
real  value  operations.  For  all  simulated  codes,  a  performance 
within  1.5  dB  of  the  optimum  bit  error  performance  has  been 
achieved,  even  for  long  codes.  For  example,  at  the  BER  10-5, 
with  an  o(K3)  syndrome  computations,  a  degradation  of  less 
than  a  dB  with  respect  to  the  ML  performance  is  achieved  for 
the  (128,64,22)  extended  BCH  code. 

Table  1:  Computation  cost  for  extended  Hamming  codes. 


m 

code 

Nord 

N.ot 

[1] 

order- 1 

3 

(8,4,4) 

15 

27 

17 

36 

4 

(16,11,4) 

33 

68 

60 

108 

5 

(32,26,4) 

63 

153 

188 

290 

6 

(64,57,4) 

113 

330 

- 

726 

7 

(128,120,4) 

199 

703 

- 

1,736 
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Abstract  —  A  fast  algorithm  for  the  evaluation  of 
error  magnitudes  for  Reed-Solomon  codes  is  obtained  here 
in  terms  of  the  error  locations  and  syndromes.  This  fast 
algorithm  is  compared  to  the  Forney  algorithm  in  terms 
of  required  additions  and  multiplications  and 
implementation  speed. 


I.  INTRODUCTION 

Assume  a  t  error  correcting  Reed-Solomon  code  and 
assume  that  the  error  locations  have  been  determined  using 
the  Berlekamp-Massey  algorithm  or  some  other  procedure. 
The  Forney  algorithm  [1]  is  the  common  algorithm  used 
for  obtaining  the  error  magnitudes  in  Reed-Solomon 
decoding.  For  a  codeword  with  v<t  received  errors,  the 
Forney  algorithm  calculates  the  error  magnitudes  from  the 
error  locations  Bj,  i=l,2,—,v  and  the  syndromes  Sj, 
i=l,2,-,vas  [2,3] 

Yr^CPibtjICl+PjPj1)],  i=l,2,...,v  (1) 

where  A(X)[1+S(X)]=Q(X)  mod(X2v+1),  S(X)=SjX 
+S2X2+"-+StXt,  and  A(X)=(1+B1X)(1+B2X)  -(1+BvX) 
=1+AjX+A2X2+—  +AVXV.  Now  Q(X)  can  be  expressed 
as  [4]  fi(X)=l+(S1+A1)X  +(S2+ A  j  S  j+A2)X2-i™  • 
+(Sv+A1Sv_1+A2Sv_2+-"+Av_1S1+Av)Xv.  The 
number  of  required  additions  is  (5v2-v)/2  and  the  required 
multiplies  is  (7v2-5v)/2.  In  addition,  there  are  v(v-l) 
exponentiations. 

In  general  once  the  error  locations  have  been 
determined,  the  error  magnitudes,  y;,  i=l,2,-,v,  are 
obtained  by  solving 


"Pi  p2  •• 
2  2 

M2  - 

Pv" 

pj 

V 

= 

V 

S2 

V  *  V 

h  - 

Pv_ 

_Yv 

■coT 

< 

1 

(2) 


II.  FAST  ALGORITHM 

Since  the  B  matrix  is  a  Vandermonde  matrix  it  is  of  full 
rank  and  standard  techniques  can  be  used  to  diagonalize  it. 
Then  using  back  substitution  and  making  full  use  of  the 
structure  of  a  Vandermonde  matrix,  we  developed  the 

This  work  was  supported  in  part  by  a  grant  from  the  US 
Army  Research  Office  under  the  Focused  Research  Initiative. 


following  iterative  algorithm  for  obtaining  the  error 
magnitudes 


Sv_j=Sv_j-BiSv_j_1  j=0,-,v-i-l,  i=l,-,v-2 

SV=(SV-BV_1SV_1)/(BV-BV_1) 


Sv-i-Sv-i  Sv-i+j 
Sv— i=Sv_i/(6v_i-BV-i— l) 


j=l,-,i 


^v-i+j-^v-i+j  /(^v-i+j  ^v-i-l) 
Si=SrSj+i 

Sj^/Bj 


j=L-,i  - 
j=l,-,v-l 


i=l,—,v-2 

(3) 


where  the  error  magnitudes  y/s  are  contained  in  the  Sj’s. 
The  number  of  required  additions  is  3v(v-l)/2  and  the 
required  multiplies  is  v2. 

This  fast  algorithm  for  evaluating  the  error  magnitudes 
for  Reed-Solomon  decoding  requires  approximately  5/3 
fewer  additions  and  7/2  fewer  multiplications  than  the 
Forney  algorithm  without  any  exponentiations.  The  total 
number  of  operations  including  additions,  multiplications, 
and  exponentiations  for  the  Forney  algorithm  is  7v2-4v 
and  for  the  fast  algorithm  (5v2-3v)/2.  Also,  the  memory 
required  for  the  Forney  algorithm  and  the  fast  algorithm 
are  both  small  and  essentially  equal  to  the  number  of  error 
magnitudes  v.  Thus,  the  fast  algorithm  calculates  the 
error  magnitudes  faster  than  the  Forney  algorithm  by  a 
factor  ranging  from  approximately  1.67  to  3.5.  If  the 
operations  require  the  same  time  the  speedup  factor  is 
approximately  2.8. 

A  comparison  of  the  execution  times  for  calculating 
the  error  magnitudes  using  the  Forney  algorithm  and  the 
fast  algorithm  was  performed  for  a  length  1023  Reed- 
Solomon  code  with  v=t=l,2,— ,10.  It  was  shown  that  for 
this  case  the  execution  times  for  the  fast  algorithm  are  at 
least  a  factor  of  two  faster  than  the  Forney  algorithm. 
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Abstract  —  Based  on  analysis  of  the  property 
of  a  class  of  cyclic  codes,  an  algorithm  for  neural  soft 
decision  decoding  of  those  codes  is  presented.  The 
complexity  of  the  new  algorithm  is  much  less  than 
that  of  the  available  algorithms  for  decoding  general 
linear  block  codes,  and  its  performance  is  approached 
to  that  of  the  maximum  likelihood  decoding. 

SUMMARY 

The  adaptability  and  parallel  computing  capabil¬ 
ity  of  neural  networks  make  them  be  specially  ade¬ 
quate  for  error  correcting  tasks*  Several  neural  de¬ 
coding  schemes  have  been  proposed.  Now,  it  is  well 
known  that  neural  networks  can  be  employed  in  soft- 
decision(SD)  decoding,  however,  it  calls  for  further 
study  on  the  decoding  complexity. 

For  an  ordinary  ( n,  k)  binary  linear  block  codes, 
most  available  neural  decoding  implementations  per¬ 
form  SD  decoding  by  searching  a  codeword  with 
a  minimal  distance  apart  from  the  received  vector 
r  =  (ri,  f2j  —  * ,  rn)  in  the  whole  code  space  C  consist¬ 
ing  of  2k  elements,  the  decoding  complexity  becomes 
large  as  2k  increases.  We  can  define  the  decoding 
complexity  as  the  number  of  elements  in  the  decoder 
searching  set.  So  the  complexity  of  an  ordinary  neu¬ 
ral  decoder  for  (n,  k )  code  is  2k .  To  a  class  of  cyclic 
codes,  we  trade  a  slight  degradation  in  performance 
for  reducing  decoding  complexity  by  using  a  property 
of  these  codes. 

Consider  a  systematic  cyclic  code 
whose  error-correcting  capability  is  i  <  \s^\  f°r 
hard-decision(HD)  decoder.  Encoding  is  described 
in  the  group  ({1,-1},  x),  the  encoding  equation  is 
k 

Ci  =  n  bg' 1  y  where  { g2j }  is  the  generator  matrix. 

In  each  codeword,  the  first  k  bits  are  the  informar 
tion  bits,  which  correspond  to  a  set  I  =  {1,2,  —  ',  £} 
and  the  other  n  —  k  bits  correspond  to  asetQ  — 

{£+  1,  A: +  2,  •  -  ,  n}.  Define  a  weight  WJ i(e)  =  J]  eif 

i£A 

where  A  is  a  subset  of  {1,2,  *  —  We  can  prove 
the  following  theorem: 

Theorem  1  Let  r  be  a  received  vector  of  the 


systematic  cyclic  code,  e  =  c  0  b  be  the  error  vector 
can  be  corrected  by  SD  decoding,  we  get  VF(e)  = 

n 

53  e,  <  d  —  1.  Then,  the  number  of  error  bits  in  I 

i=i 

can  be  always  reduced  to 

k 

W/(e)  =  £>,<<  (1) 

by  cyclically  shifting  the  vector  r  ,  if  and  only  if 


Using  the  above  property,  we  lead  a  simplified  de¬ 
coding  implementation  for  those  codes.  The  new  im¬ 
plementation  is  described  as  the  following: 

1)  Cyclically  shift  r  m  times  to  get  r*  ,  such  that 

k 

=  J2  rf  minimized  ; 

i^l 

2)  Determine  hard-decision  vector  b*  of  r*  ; 

3)  Encode  b|  =  *  *  ‘  >  a  codeword  c*, 

where  c*  =  Yl  7 ; 

t— o 

4)  r*  =  c*  ©  r*,  where  r\  =  r*c*; 

5)  Decode  r'  to  obtain  a  codeword  c7  using  a  neu¬ 
ral  decoder; 

6)  c*  =  c'0  c*; 

7)  Cyclically  shift  c*  n  —  m  times,  get  a  codeword 
c,  which  is  the  result  of  decoding. 

In  the  first  step,  we  get  r*  with  a  minimum  weight 
W/(r*).  An  approximate  assumption  is  that  r*  sat¬ 
isfies  (1).  Based  on  this  assumption,  the  number  of 
error  bits  in  7  of  r*  is  no  more  than  t ,  then  we  get 
cj  <  t.  So,  the  decoder  of  the  fifth  step  searches 
the  desired  codeword  only  in  a  subset  Sn  of  C  consist¬ 
ing  of  the  codewords  cf  such  that  W/(c7)  <  t  rather 
than  in  the  whole  codeword  space  C *  This  decoder 
is  called  “n arrow  sense  decoder(NSD)”.  The  number 

t  £ 

of  codewords  in  Su  is  Nu  =  £](  .  ),  so  the  complex- 

i~i  z 

ity  of  NSD  is  Nu.  The  complexity  of  new  decoder 
is  little  larger  than  NU)  so  the  proposed  decoder  is 
much  simpler  than  the  ordinary  decoder.  Simulation 
results  indicate  that  the  performance  is  close  to  that 
of  maximum  likelihood  decoder. 
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Abstract  —  This  paper  analyzes  excursions  of  adap¬ 
tive  algorithms.  The  distribution  of  the  number  of 
excursions  in  n  units  of  time  is  approximated  by  a 
Poisson  distribution.  The  mean  and  distribution  of 
the  time  of  the  occurrence  of  the  first  excursion  are 
approximated  by  those  of  an  exponential  distribu¬ 
tion.  Expressions  for  the  error  in  the  approximations 
are  derived.  The  approximations  are  shown  to  hold 
asymptotically  as  the  excursion  defining  set  converges 
to  the  empty  set  and  as  the  algorithm’s  step  size  fi 
convei'ges  to  zero.  The  validity  of  the  approximations 
is  tested  on  a  variety  of  examples. 

I.  Introduction 

We  study  excursions  of  adaptive  algorithms  of  the  form 

Wk+i  =Wk-iih(Wk,Xk,Dk),  (1) 

where  Xk  and  Dk  are  real  valued  random  variables,  ^  is  a  con¬ 
stant  known  as  the  algorithm’s  step  size,  and  h  is  a  measurable 
function. 

The  updates  of  the  error  between  estimated  and  optimal 
weights  for  many  adaptive  filters  (for  example  The  Least  Mean 
Square  (LMS)  algorithm  and  its  “signed”  variants)  are  of  the 
form  of  Eq.  (1).  When  one  of  these  filters  is  driven  by  an 
i.i.d.  sequence  of  inputs  {X*}  and  an  independent  i.i.d.  se¬ 
quence  of  disturbances  {D*},  then  Eq.  (1)  defines  a  discrete 
time  Markov  chain.  The  performance  of  an  algorithm  is  ac¬ 
ceptable  if  its  corresponding  Markov  chain  spends  most  of  its 
time  in  a  neighborhood  of  the  equilibrium  0  (or  preferably  at 
0).  However,  on  rare  occasions,  an  excursion  (a  visit  or  a  clus¬ 
ter  of  visits  to  the  set  B  =  [ b ,  oo)  or  the  set  B  =  [—6,  b]c)  will 
occur. 

Denote  the  time  of  the  beginning  of  the  first  excursion  by 
tb  and  the  number  of  excursions  in  n  units  of  time  by  Sn. 
We  approximate  the  expectation  of  tb ,  the  distribution  of  tb, 
and  the  distribution  of  Sn.  The  distribution  and  the  mean  of 
tb  are  approximated  by  those  of  an  exponential  distribution 
with  mean  1/tv(B)6  and  the  distribution  of  Sn  by  a  Poisson 
distribution  with  mean  \6,  where  1/6  is  the  mean  clump  size 
given  that  there  is  an  excursion,  A  =  nir(B),  and  ir  is  the 
stationary  distribution  of  the  chain. 

II.  Excursion  analysis  as  B  — >  $ 

Lattice  state  space  case:  Let  {Wk}k>o  be  an  irreducible, 
positive  recurrent,  aperiodic  Markov  chain  with  a  countable 
state  space  S  (e.g.  the  even  steps  of  the  sgn-sgn  variant  of  the 
LMS  algorithm).  Define  an  excursion  to  be  a  cluster  of  visits 
to  the  set  B  that  is  separated  by  the  previous  cluster  by  a  visit 
to  state  0  or  by  r  visits  to  Bc  for  some  integer  r.  Dividing  the 
n  steps  of  the  chain  into  independent  cycles  that  start  from 
state  0  and  end  at  state  0,  calculating  an  upper  bound  for  the 
probability  that  a  cycle  contains  more  than  one  cluster,  and 


using  the  law  of  rare  events  [1,  page  117]  produces  the  desired 
approximation  for  Sn  [2,  theorems  3.1  and  3.2]. 

The  approximation  for  the  distribution  of  tb  can  be  derived 
from  the  fact  that  P(tb  >  x)  =  P(S^X j  =  0)  and  the  approx¬ 
imation  derived  for  Sn .  The  approximation  for  the  Etb  is 
given  in  [2,  lemma  3.3]. 

The  sequence  { Wk  }  considered  so  far  is  a  scalar  sequence. 
However,  the  approximations  are  valid  even  where  {Wk}  is  a 
sequence  of  vectors  of  size  m.  All  that  is  needed  is  to  map 
the  state  space  pL  times  Zm  into  //  times  Z  and  choosing  a 
sequence  of  sets  Bn  that  converges  to  <f>,  for  example  the  sets 
([— 6/i,  b}i]  x  [—bfi,  bfi]  x  •  •  ■  x  [—bfi,  bfi])c. 

Continuous  state  space  case:  The  three  approximations 
are  extended  to  algorithms  with  an  uncountable  state  space 
under  the  assumption  that  the  resulting  Markov  chain  is  Har¬ 
ris  recurrent.  This  assumption  will  be  required  in  order  to 
attach  to  the  chain  a  generic  atom  that  is  visited  infinitely 
often  and  hence  may  be  used  as  a  regeneration  state. 

Examples  of  algorithms  with  both  continuous  and  lattice 
state  space  are  given  to  demonstrate  these  results.  One  of  the 
examples  demonstrates  the  different  behavior  of  clusters  for 
the  LMS  algorithm  and  three  of  its  signed  variants.  Another 
example  demonstrates  the  applicability  of  the  approximations 
in  the  vector  case. 

III.  Excursion  analysis  as  fi  — >  0 

Here  we  assume  that  the  set  B  that  defines  excursions  is  a 
fixed  subset  of  the  real  line,  for  example  [6,oo),  and  examine 
excursions  as  the  step  size  ft  — ►  0.  The  results  of  Section  II, 
where  B  — *■  (f>,  analyze  excursions  of  a  single  Markov  chain 
and  increase  the  rarity  of  excursions  by  decreasing  the  set  B. 
In  contrast,  in  this  section,  each  value  of  fi  in  Eq.  (1)  defines  a 
Markov  chain  with  some  stationary  distribution,  say  7tm,  and 
the  rarity  of  excursions  is  increased  by  decreasing  i rM(R). 

Our  approximations  for  the  mean  and  distribution  of  tb 
are  still  valid  in  this  setting  .  All  that  remains  to  be  answered 
is  whether  these  bounds  converge  to  0  or  not  as  fi  — ►  0.  We 
show  that  the  approximations  for  the  mean  and  distribution 
of  tb  hold.  The  approximation  for  the  distribution  of  Sn  is 
shown  to  hold  after  some  modification  to  the  definition  of  an 
excursion. 
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Abstract  —  This  paper  treats  the  case  where  Z  ( t ) 
is  a  continuous  wide  sense  stationary  process  which 
is  sampled  at  instants  tn  =  n  +  Anj  n  G  Z.  The  series 
of  random  gaps  An  is  stationary  in  the  sense  weaker 
than  strict  second  order.  We  present  a  necessary  and 
sufficient  condition  (NSC)  for  the  exact  (mean  square) 
linear  reconstruction  of  Z  (i). 


The  sequence  U  =  {Un,n  G  Z}  where  Un  —  Z  (tn), ' 
spans  a  Hilbert  space  H(U).  The  problem  is  to  know  if, 
for  any  t,  Z(t)  G  H(U)  or  equivalently  H(U)  =  H(Z) 
(H(Z)  is  engendered  linearly  by  Z).  In  this  case,  the 
observation  of  the  randomly  sampled  sequence  is  enough 
to  construct  the  original  process  Z. 


I.  Introduction 

Chronological  series  often  stem  from  random  continu¬ 
ous  time  processes  (t  G  R)  that  we  wish  to  reconstruct. 
The  linear  reconstruction  of  the  underlying  process  de¬ 
pends  on  the  sampling  technique  and  on  the  information 
we  have  on  this  latter. 


In  the  framework  of  wide  sense  stationary  processes, 
the  case  tn  =  n6  has  been  completely  resolved  by 
Lloyd  [1].  Concerning  the  random  sampling,  the  model 
tn  =  nO  +  An  is  the  most  frequently  used.  When  the 
gaps  An  are  known  or  observed,  numerous  reconstruction 
formulas  exist  [2]  and  new  ones  still  appear  [3]. 

On  the  other  hand,  few  attemps  have  been  made  in  the 
case  where  the  An  are  not  observed  and  characterized 
only  by  their  statistical  properties  [4].  In  what  follows 
we  give  a  NSC  for  linear  reconstruction  without  (mean 
square)  error,  of  the  underlying  process,  in  the  case  where 
the  An  sequence  is  stationary  in  some  sense. 


II.  Hypothesis 

The  taken  hypotheses  are  marked  Ho  for  the  sampled 
process  Z  =  {Z(t),t  G  R}  (wide  sense  stationary,  mean 
square  continuous)  and  Hi  for  the  sequence  A  =  {An,  n  G 
Z}  of  non  degenerated  r.v.  (Z  and  A  being  supposed 
independent): 


Ho 


(  E[Z(t)}  =  0 

+oo 

E  [Z  (t)  Z*  (t  -  r)]  =  J  eiuTdSz{u) 

<  —  oc 

+oo 

Z(t)  =  j  eiutd$z (w) 

<  -oo 


(1) 


(  ip  (u))=E  [eiuAn]  (*) 

Hi  l  V9€Z,<^(uO=£[ei“(An~'4n~,)]  (**) 

I  (i)  and  (ii)  do  not  depend  on  n 

(2) 

Sz  and  <&z  are  the  power  spectrum  (spectral  measure) 
and  the  Cramer-Loeve  representation  of  Z  respectively 


[5]. 


Let: 


III.  Theorem 


-f  oo 


Gn=  J  d$z  (w) 


—  OO 

Vn  —  U n 


Gn 


Z(t)  can  be  reconstructed  linearly  without  (mean 
square)  error  from  the  observation  of  the  series 
U  =  {Un,n  G  Z}  where  Un  =  Z  (; tn ),  if  and  only  if: 

a)  The  spectral  measures  (on  [-7r,7r])  of  the  two  se¬ 
quences  G  =  {Gn,n  G  Z}  and  V  =  {Vn,Ti  G  Z}  are 
mutually  singular. 

b)  The  translated  measures  (a>)  =  5^  (a;  +  2 nk)  are 
mutually  singular  for  any  k  G  Z. 


c)  If  A  =  {u;;^  M  7 ^  0}  then  J  dSz(v)  = 


I  z(t)Y 


Remark: 

Condition  a)  is  easily  verified  in  the  case  where  Z  has  a 
line  spectrum  and  A  is  a  continuous  r.v.  process. 

The  second  condition  is  due  to  Lloyd  [1]. 

Condition  c)  signifies  that  n0^  nul  on  support 

of  Sz{oj). 


IV.  Conclusion 

In  this  paper,  we  obtained  a  condition  necessary  and 
sufficient  for  the  exact  linear  reconstruction  of  a  station¬ 
ary  stochastic  process  subjected  to  a  random  additive 
sampling. 
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Abstract  —  Distortion  measures  are  proposed 
on  the  basis  of  parametric  filtering,  a  technique 
of  signal  characterization  that  combines  a  para¬ 
metric  filter  bank  with  an  analysis  of  first-order 
autocorrelation.  Robustness  of  the  distortion 
measures  against  narrow-band  interference  and 
spectral  notch  filtering  is  investigated. 

I.  Introduction 

Given  a  zero-mean  stationary  signal  Xtl  consider  the 
demodulated  first-order  autocorrelation  as  defined  by 

le(v)  ■■=  3 {t{e-iep{a)}  (-1  <  v  <  1), 

where  p(a)  is  the  (ordinary)  first-order  autocorrela¬ 
tion  of  the  filtered  signal  Yt(a)  :=  aFM(a)  +  Xt  with 
a  :=  7ie~ld.  It  can  be  shown  [1]  [2]  that  for  almost  any 
9  the  function  7 0(77)  uniquely  determines  the  correla¬ 
tion  structure  of  Xt  and  hence  forms  a  characterization 
function  of  the  signal.  The  parametric  filtering  (PF) 
method  is  one  that  utilizes  this  characterization  prop¬ 
erty  of  70(77)  for  signal  discrimination  [1].  In  particular, 
distortion  measures  can  be  derived  from  70(77). 

II.  PF-Based  Distortion  Measures 
For  any  —  1  <  rja  <  Vb  <  1,  consider  the  function 

Po(v)  |  h'g(r))  +  (7 b{t}+)  +  l)  6{r)  -  r,a) 

+  s(v~Vb)], 

where  7^(77)  is  the  derivative  of  70(77)  w.r.t.  77  and  S(rj) 
is  the  Dirac  delta.  Using  the  results  in  [1],  it  can 
be  shown  that  pG(rj)  not  only  is  equivalent  to  70(77) 
but  also  forms  a  generalized  pdf  in  the  interval  [77a,  rjb\. 
This  latter  property  gives  rise  to  many  possibilities  of 
defining  distortion  measures.  For  instance,  one  may 
define  the  Kullback-Leibler  information  divergence  by 

"(P?IIPJ):=  /  PeWKiplM/pfa))^, 

where  K(u)  :=  u  —  log  u  —  1.  Since  the  information  di¬ 
vergence  extends  to  non-probability  densities,  one  may 

*T.  H.  Li  is  with  the  Department  of  Statistics. 
f  J.  D.  Gibson  is  with  the  Dept,  of  Electrical  Engineer¬ 
ing.  He  was  partly  supported  by  NSF  grant  NCR-9303805. 


also  define  a  distortion  measure  as 

4P°e4’Pe)  :=k(?1  \p°e/Pe) 

rVb 

=  K (Peiv) /Peiv))  dr], 

''Va 

where  gr*fo)  :=  1  +  8(r)  -  rja )  +  6(r]  -  Vb)  is  the  density 
of  “uniform”  distribution. 

III.  Robustness 

Suppose  the  signal  is  contaminated  by  a  narrow-band 
noise  so  that  Xt  has  a  spectral  density  of  the  form 
fiiuj)  =  (1  -  c)/o(w)  +  eg(cj),  where  /0  is  the  noise- 
free  spectrum  and  g  is  the  noise  spectrum  with  g(u)  = 
|A  1  for  \oj  ±  u;o|  <  and  g(uj)  —  0  otherwise 
(A  <  1).  To  quantify  the  robustness  of  a  distortion 
measure  against  the  contamination,  we  use  the  second 
derivative  of  the  distortion  measure  at  e  =  0,  known 
as  the  local  curvature  of  the  distortion  measure. 

For  the  widely  used  Kullback-Leibler  (KL)  spectral 
divergence  [3],  £>kl(/i,/o)  :=  /Jf(/i(w)//o(w))i;, 
it  is  easy  to  show  [4]  that  (d2/de2)  DKL(fu  /0)|e==0  = 
0( A  x).  This  confirms  again  that  the  KL  divergence 
is  not  robust  to  narrow-band  contaminations  [3]. 

Compared  to  the  KL  spectral  divergence,  the  PF- 
based  distortion  measures  exhibit  more  robustness  to 
narrow-band  contaminations.  In  fact,  it  is  not  difficult 
to  show  that  with  1  -  max{|7?a|,  |t76|}  »  A  the  local 
curvatures  of  «(pg|| p\)  and  ^(p0e\plG)  take  the  form  of 
0(1)  as  A  0.  Similar  results  can  be  obtained  for 
distortions  due  to  spectral  notch  filtering  [4]. 
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Abstract —  Existence  and  uniqueness  are  established  for  a 
translation-invariant  Gibbs  measure  corresponding  to  a  spa¬ 
tial  point  process  that  has,  in  addition  to  inhibition  and  clus¬ 
tering,  the  new  feature  of  penalizing  isolated  points.  This 
point  process  has  the  so-called  two-step  Markov  property,  and 
the  associated  density  function  is  characterized  in  terms  of  2- 
interaction  functions.  The  asymptotic  normality  of  certain 
statistics  of  the  point  process  is  established  when  the  size  of 
the  observation  window  tends  to  IR2. 

I.  A  Two-Step  Markov  Point  Process  on 

Bounded  Sets 

Let  Qf  denote  the  set  of  all  finite  lists  of  points  from  IR2. 
A  typical  point  has  the  form  x  =  (xi , . . . ,  xn)  for  some 

n,  where  each  xt  £  ER2.  For  x  £  Qf,  the  number  of  isolated 
points  in  x  is  given  by  I{x)  \=  |{z  :  ||x*  —  ®j||  >  d2,Vj  ^  «  }|, 
where  d2  >  0  is  a  specified  threshold,  ||  •  ||  is  the  Euclidean 
norm  on  IR2,  and  |  •  |  denotes  the  cardinality  of  the  indicated 
set.  Fix  0  <  di  <  d2t  and  let  t/>  :  [0,  00)  -»■  [0,  00)  be  a  bounded 
function  such  that  ip(r)  =  1  whenever  r  >  d2,  and  ^(r)  =  0  if 
r  <  d\.  Fix  0  <  7  <  1.  For  x  £  Qf,  consider  the  density  func¬ 
tion  f(x)  =  ayI{x)  nt<,  ~x3  ll)>  where  a  is  a  normalizing 

constant  [3,  Section  2].  The  function  $  is  responsible  for  pair¬ 
wise  interaction  and  may  give  rise  to  clustering  and  inhibition 

[1].  What  is  new  here  is  the  constant  7  which  is  responsible 
for  penalizing  realizations  with  isolated  points.  We  call  two 
points  neighbors  if  the  distance  between  them  is  no  more  than 
d2.  It  can  be  shown  [3]  that  the  ratio  / (x U {£}) / / (x)  depends 
on  G  on  the  points  of  x  that  are  neighbors  of  £,  and  on  the 
neighbors’  neighbors.  If  we  consider  the  probability  measure 
fdv a,  where  A  is  a  bounded  set  and  z/A  is  the  measure  corre¬ 
sponding  to  a  Poisson  process  in  A  with  constant  intensity  A, 
then  the  conditional  probability  of  an  event  in  A 7  C  A,  given 
what  is  in  A  \  A7,  depends  on  the  points  in  A7,  on  the  points 
in  A  \  A7  that  are  neighbors  of  A7,  and  the  neighbors’  neigh¬ 
bors.  This  fact  motivates  the  term  “two-step  Markov.”  In 
[3],  we  extended  Ripley  and  Kelly’s  [7]  characterization  theo¬ 
rem  on  Markov  density  functions  to  m-step  Markov  densities. 
As  a  result,  we  obtain  the  representation  f(x)  =  a 
where  4>  is  a  so-called  2-interaction  function,  i.e.,  $(y)  ^  1  im¬ 
plies  that  every  two  points  in  y  are  either  neighbors,  or  else, 
there  is  a  third  point  of  y  which  is  a  neighbor  to  both  points. 
Furthermore,  $(y)  =  1  whenever  max,,;  \\yi  -  y3\\  >  2 d2. 

II.  Existence  and  Uniqueness  of  a  Gibbs 

Measure  on  IR2 

The  goal  of  this  section  is  to  define  a  point  process  (hav¬ 
ing  the  features  of  the  point  process  in  the  previous  section) 
on  the  set  Q  of  all  lists  of  points  from  IR2  whose  intersec¬ 
tions  with  bounded  sets  are  finite.  Note  that  some  of  the 
elements  of  Q  are  infinite  lists.  Following  Preston  [6,  Chap¬ 
ter  6],  we  define  the  translation-invariant  potential  function 


V  on  Qf  by  V(x)  :=  -  log(/(z)/a)  =  -  Ylvcx  loS  ^(*0-  For 
any  subset  A  in  IR2  and  s  £  Q,  let  denote  the  restric¬ 
tion  of  s  to  A.  Let  A  be  any  bounded  subset  of  IR2.  If 
x  is  a  list  of  points  from  A,  and  y  is  a  list  of  points  from 
Ac,  define  the  conditional  potential  [6,  p.  98]  Va(x\ y)  :  = 
-limyniR2{X:zClU^,zn^0log$(^}-  For  each  temperature 
and  each  A,  we  can  define  a  conditional  probability  measure 
corresponding  to  the  above  conditional  potential.  This  con¬ 
ditional  measure  partially  inherits  the  property  of  penalizing 
isolated  points.  The  following  result  is  a  consequence  of  [5, 
Theorem  2.2  h  Remark  2.3],  [2]  and  [3,  Lemma  3.2]. 

Theorem  1:  For  every  sufficiently  large  temperature,  there 
exists  a  unique  translation-invariant  Gibbs  measure  defined 
on  events  on  Q,  and  corresponding  to  the  above  conditional 
measures. 

III.  Asymptotic  Normality 

Assume  that  ip(r)  =  Urt-i ,rt)(r)>  where  0  =  ro  < 

d3  =  n  <  •  •  ■  <  rjwf  =  d2,  rjvr+i  =  00,  =  0,  ^m+ 1  =  1,  and 

M  >  1.  Observe  that  the  density  /  now  takes  the  form  of 
f(x)  =  ay1^  n"i  where  Si(x)  is  the  number  of  pairs 

of  points  that  are  r;_i  to  rt  units  apart.  Let  An,  n  —  1,2,..., 
be  a  sequence  of  bounded  subsets  of  IR2  such  that  An  T  FI2 
and  n/area(An)  converges  to  a  finite  constant  as  n  -+  00.  Let 
n  be  fixed,  and  define  X  :=  (A,  Si , . . . ,  Sm,  7),  where  N  is  the 
total  number  of  points  in  a  realization  of  the  Gibbs  process  of 
the  previous  section,  I  is  the  total  number  of  isolated  points, 
and  Si  is  the  number  of  pairs  of  points  that  are  within  a 
distance  n- 1  to  77,  all  in  An.  For  each  j  =  (ii, >2)  £  ^  ,  let 
Uj  :=  {«  =  (u\,u2)  £  IR2  :  d2jt  <  <  d2(jt  +  l),i  =  1,2}. 

Let  Jn  be  the  set  of  indices  j  for  which  An  n  Uj  is  not  empty. 
The  asymptotic  normality  of  X  relies  on  [4,  Theorem  2.2]  and 
on  [3,  Lemma  3.2]. 

Theorem  2:  If  the  temperature  is  sufficiently  large,  then  as 
n  00,  (X  —  E[X])/|Jn|1/2  converges  in  distribution  to  a 
zero-mean  normally  distributed  IRM+2-valued  random  vari¬ 
able  with  a  covariance  matrix  specified  in  [3,  Section  4] 
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Abstract  It  is  shown  that  Markov  maps  when  subjected  to 
weakly  continuous  random  perturbations  have  an  attractive 
invariant  measure  that  incorporates  the  dispersive  effects  of 
perturbations  as  well  as  the  ordering  effects  of  the  mapping. 

I.  Markov  Maps. 

Are  defined  on  the  basis  of  a  finite  set  of  functions 
{y;|  /  =  1,2..  .A |  on  a  compact  metric  space  (X,d).  Associa¬ 
ting  a  probability  pt  to  every  function  f ,  normalized  as 
Z  Pi  -  1 ,  a  probabilistic  dynamics  on  X  is  defined  by  the  map 
x  f(x)  ,  with  probability  pi .  This  probabilistic  dynamics  on 
X  defines  a  deterministic  dynamics  on  the  set  of  probability 
measures  on  X ,  <p(X) ,  by  the  Markov  operator,  M .  For  a 

measure  v  e  <P( X)  and  each  measurable  set  A  c  X ,  the  action 
of  M  is  defined  by 

Mv{A)  =  J dv  P0(A\  •)  =  ip,v°f-\A) , 

1=1 

where  P0{a\x}  is  the  usual  Markov  transition  probability  and 
<p(X)  is  endowed  with  Hutchinson's  metric  [1].  When  the  func¬ 
tions  f  have  contractivity  factors  sl  <  1 ,  the  Markov  operator 
M  has  contractivity  factor  5  =  max^J  <  1 .  The  dynamics  of  a 

Markov  map  is  very  simple:  for  all  initial  p ,  Mn  p  converges 

weakly  to  the  invariant  measure.  Techniques  to  encode  images  as 
fractals  are  based  on  this  fact. 

II.  Random  Perturbations. 

An  operator  S:  describes  the  stationary  random 

perturbations  on  X .  We  restrict  to  random  perturbations  that  are 
specified  by  their  action  on  atomic  measures  (for  examples,  see 
[3])>  by  giving  a  function  N\  B(X )  such  that 

A(^|x)  =  for  every  point  x  eX  and  any  measurable 

subset  A  c  X  .  The  function  is  measurable  for  each  A  . 

The  action  of  S  on  any  measure  v  e  <p(X)  is  then  given  by 
Sv(A)  =  \dv  N(a)^  .  The  perturbed  Markov  map  is  defined  by 
the  combined  operation  Rv~SoM  and  it  follows  that 
Rve<p(X)  whenever  ve<p(X)  .  Written  in  terms  of  N  ,  the 

perturbed  map  is  Rv(A)  =  \P{A\^dv,  where  we  introduced 
P(^4|x)  =  J PiN{A\x)°  fx(x)  ,  the  perturbed  transition  proba¬ 
bility.  In  the  unperturbed  limit,  A^4|x)  =  SX(A)  and  P(a\x} 


reduces  to  P0(A\x}  as  it  should.  Notice  that  it  is  enough  to  define 
the  function  at  points  x  eW(X)  =  U,  fj(X)  e  X . 

III.  Stability  under  Random  Perturbations. 

The  effects  of  a  random  perturbation  are  the  opposite  to  the 
effects  of  M .  The  stability  under  random  perturbations  is  not 
evident  [2].  A  Markov  map  is  stable  under  a  given  perturbation  if 
the  corresponding  randomly  perturbed  Markov  operator  has  a 
unique  attractive  invariant  measure.  Under  a  severe  random  per¬ 
turbation  not  every  Markov  map  would  be  stable.  However,  we 
have  found  a  class  of  random  perturbations  that  do  not  change  the 
contractivity  of  Markov  maps.  Perturbations  in  this  class  we  call 

weakly  continuous  perturbations.  A  perturbation  A(zl|x)  is 
weakly  continuous  if 

|j /  clN(- 1 x)-\f  dN{-  |_y)|  <  d{x,y) , 

for  every  pair  of  points  x  ,  y  in  X .  Notice  that  this  condition  is 

not  a  restriction  on  the  amplitude  of  the  perturbation,  it  simply  is 
a  weak  form  of  continuity  on  N  . 

THEOREM(weak  continuity  implies  stability)  Let  N  be  a  weakly 
continuous  random  perturbation.  Then  R  has  the  same  contrac¬ 
tivity  factor  s  as  M . 

The  theorem  says  that  under  the  class  of  weakly  continuos  pertur¬ 
bations,  the  perturbed  Markov  operator  R  has  a  unique  attractive 
fixed  point  .  In  other  words,  Markov  maps  are  stable  under 
weakly  continuous  perturbations,  in  the  sense  that  there  exists  an 
attractive  invariant  measure  satisfying  the  equation  Rv^  -  , 

for  any  choice  of  the  set  of  probabilities,  j  . 

Weakly  continuous  perturbations  conform  a  class  big  enough  as  to 
include  the  full  class  of  homogeneous  perturbations,  i.e.,  those 
that  are  introduced  with  the  help  of  independent  identically  dis¬ 
tributed  random  variables  [3].  Hence,  under  any  translational 
invariant  perturbation  a  Markov  map  always  has  an  attractive 
invariant  measure.  Interesting  features  of  the  noisy  invariant  mea¬ 
sure  [4]  are  that  it  shows  details  much  finer  than  the  length  scale 
settled  by  the  noise  amplitude  and  that  the  self-similar  property  of 
the  unperturbed  invariant  measure  is  lost.  At  small  noise  amplitu¬ 
des  a  degraded  self-similarity  is  retained. 
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I.  Introduction 

Let  M  be  the  collection  of  probability  measures  on  a  measur¬ 
able  space  (E,B).  Consider  the  binary  decision  problem  Ho 
vs.  Hi  where  the  statistical  hypotheses  are  represented  by  non- 
parametric  families  of  probability  distributions  Vi  C  A4,i  = 
0, 1.  Two  important  special  cases  are  t- contamination  and 
total  variation  families.  These  families  are  defined  by 

Vt  =  {Q\Q  =  (1  -  Ci)Pi  +  etH,  H  eM} 

and 

Vt  =  {Q  €  M  \  sup  | Q(A)  -  Pi(A) |  <  g}, 

A€t3 

respectively,  for  some  Pi  6  A4  and  0  <  a  <  1.  In  these  cases 
the  Vi  formalize  the  possibility  of  deviations  from  the  nom¬ 
ind  models  Pt.  We  seek  Neyman-Pearson  and  Bayes  minimax 
tests  between  V\  and  TV 

A  pair  of  distributions  (Qo,Qi)  €  Vo  X  V\  is  called  a 
least  favourable  pair  if  Q'0  (qi/qo  >0  <  Qo  (<7i/<?o  >  0  and 
Q[  (i ji/qo  >  t)>Qx  (qi j qo  >  t)  for  all  f  €  R  and  (Qo.Oi)  € 
VoxVi.  Here  qo  and  qi  denote  the  Radon-Nikodym  derivative 
of  Qo  and  Q\  with  respect  to  a  dominating  measure  p.  As  is 
well  known,  a  solution  to  above  minimax  problems  is  provided 
as  a  threshold  test  on  the  least  favourable  pair  likelihood  ratio 
[2],  [3].  Thus,  identification  of  the  least  favourable  pair  is  the 
key  to  solving  these  nonparametric  decision  problems. 

In  his  1965  paper  [2],  P.  J.  Huber  gave  the  construction  of 
the  least  favourable  pair  for  both  t- contamination  and  total 
variation  families.  This  construction  is  quite  general,  in  par¬ 
ticular,  it  works  in  any  measurable  space. 

Later,  Huber  and  Strassen  proved  their  celebrated  abstract 
minimax  theorem  in  [3]  and  [4].  Here  the  authors  assume  E  to 
be  a  Polish  space  (i.e.,  a  separable,  complete  metrizable  topo¬ 
logical  space)  with  associated  Borel  <7-field  B.  They  consider 
families  of  the  type  Vt  -  {P  6  M\P  <  vt}  where  vt  are  set 


functions  defined  on  B  satisfying 

Vi{V)  =  (W£)  =  1,  (1) 

A  C  B  implies  Vi(A)  <  Vi(B),  (2) 

AntA  implies  v(An)  t  v{A),  (3) 

Fn  |  F,  Fn  closed,  implies  Vi(Fn)  X  vx [(F),  (4) 

Ui(A  U  B)  +  i/i(A  n  B)  <  Vi  (A)  +  vi(B).  (5) 


A  set  function  satisfying  (l)-(4)  is  called  a  capacity  and  a 
set  function  satisfying  (5)  called  2- alternating.  Their  theorem 
establishes  the  existence  of  a  least  favourable  pair,  but  does  not 
give  constructions  [3,  Theorem  4.1].  Moreover,  the  conditions 
(l)-(4)  imply  the  weak  compactness  of  Vi  [3,  Lemma  2.2]. 

Define  Vi  by  either  Vi(A)  =  (1  —  ti)Pi  +  e,  for  A  ^  0,  (called 
e-contamination  capacity,)  or  vi(A)  =  min(P,(A)  +e;,l)  for 
4/0  (called  total  variation  capacity).  If  E  is  compact,  the 
vt  satisfy  (1)— (5)  and  Vt  =  {P  €  M\P  <  vt}  are  either  e- 
contamination  or  total  variation  families  [3,  Example  3,  Ex¬ 
ample  4].  However,  if  E  is  not  compact,  the  vt  do  not  satisfy 

(4). 


A  related  discussion  can  be  found  in  [5].  The  author  in¬ 
troduces  a  class  of  capacities,  denoted  by  special  capacities, 
containing  both  e-contamination  and  total  variation.  For  this 
class,  an  explicit  construction  of  the  least  favourable  pair  is 
given. 

II.  Summary 

In  this  paper  we  revisit  the  abstract  minimax  theorem  of  Huber 
&  Strassen  with  the  goal  of  removing  the  weak  compactness 
condition.  To  do  so,  we  require  different  topological  condi¬ 
tions  i  We  take  E  to  be  a  locally  compact  space  for  which 
every  open  set  is  a  K&  (i.e.,  a  countable  union  of  compacts). 
This  setting  includes  Rn,N  <  oo,  and  “well-behaved”  subsets 
of  with  their  relative  topology.  We  allow  set  functions  sat¬ 
isfying  (l)-(3),  (5)  and 

Kn  l  K,Kn  compact,  implies  Vi(Kn)  i  vt(K)  (6) 

instead  of  (4).  Note  that  both  e-contamination  and  total  vari¬ 
ation  capacities  satisfy  (6).  A  set  function  satisfying  (l)-(3) 
and  (6)  will  be  called  a  regular  Choquet  capacity . 

We  point  out  that  a  regular,  2-alternating  Choquet  capa¬ 
city  can  be  extended  to  the  one-point  compactification  E'  of 
E  with  the  point  at  infinity  in  such  a  manner  that  the  Huber- 
Strassen  construction  of  a  least  favourable  pair  applies  to  the 
compactified  space.  Thus,  our  work  is  to  construct  the  appro¬ 
priate  capacity  extensions  v[.  This  is  done  within  the  setup  of 
the  theory  of  capacities  as  developed  by  G.  Choquet  [1].  Then 
the  Vo  =  {P  <  Vq}  vs.  V[  =  {P  <  v[}  least  favourable  pair 
on  E '  must  be  related  to  the  original  problem  Vo  vs.  V\  on 
E.  In  particular,  there  is  the  issue  that  the  Vq  vs.  Vx  least 
favourable  pair  may  put  mass  at  infinity. 

The  contributions  of  this  paper  are  as  follows.  First,  we 
present  the  extension  via  one  point  compactifications  as  dis¬ 
cussed  above  and  argue  that  the  Huber-Strassen  construc¬ 
tion  of  the  least  favourable  pair  applies  to  the  compactified 
space.  Second,  if  the  vt  satisfy  vi(A)  =  inf{w*(0)|A  C 
O,  E\0  compact  in  F},  the  Vq  vs.  V[  least  favorable  pair  will 
not  have  mass  at  infinity,  and  hence,  we  obtain  the  desired  Vo 
vs.  V\  least  favorable  pair.  Both  e-contamination  and  total 
variation  do  indeed  satisfy  this  condition. 
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The  problem  of  hypothesis  testing,  which  is  to  de¬ 
cide  between  two  alternative  explanations  for  the  ob¬ 
served  data,  is  one  of  the  standard  problem  in  statis¬ 
tics.  A  discrete  memoryless  source  (  DMS  )  is  a  se¬ 
quence  of  i.i.d  random  variables.  The  distribution  of 
the  DMS  is  either  P1  or  P2.  When  a  sample  is  emit¬ 
ted  from  the  source,  the  observer  attempts  to  decide 
which  hypothesis  of  H1  :  Pi  or  H2  :  P2  is 
correct.  The  main  concern  of  this  problem  is  to  de¬ 
termine  the  best  asymptotic  exponent  of  the  second 
kind  of  the  error  probability  when  the  first  kind  of  the 
error  probability  is  (l)  fixed  (2)  less  than  2~nr .  These 
are  specified  by  (1)  the  well-known  lemma  of  Stein  (2) 
the  theorem  of  Hoeffding  (  [l]  ),  Blahut  (  [2]  ),  Csiszdr 
and  Longo  (  [3]  )  for  hypothesis  testing  problem  with 
exponential- type  constraint. 

DMS  is  an  ideal  model,  A  more  robust  model  is 
arbitrarily  varying  source  (  AVS  ),  where  the  source 
distribution  may  vary  within  a  certain  set  of  distribu¬ 
tion  from  one  time  instant  to  the  next.  The  varying 
behavior  of  the  distribution  of  AVS  is  not  known  ex¬ 
actly  to  us,  and  there  are  only  two  alternatives.  We 
consider  the  problem  of  hypothesis  testing  for  AVS 
in  the  same  way  for  DMS,  and  determine  the  best 
asymptotic  exponent  of  the  second  kind  of  the  error 
probability  when  the  first  kind  of  the  error  probability 
is  (1)  fixed  (2)  less  than  2~nr.  These  results  general¬ 
ize  the  well-known  lemma  of  Stein  and  the  theorem 
of  Hoeffding,  Blahut,  Csiszdr  and  Longo  in  statistics. 
As  a  corollary  in  information  theory,  The  best  asymp¬ 
totic  error  exponent  and  Strassen’s  theorem  for  AVS 
coding  are  obtained,  furthermore,  we  determine  the 
best  asymptotic  error  exponent  and  r-optimal  rate  ( 
the  minimum  compression  rate  when  the  error  proba¬ 
bility  is  less  than  2  nr,  r  >  0  )  of  AVS  coding  with  a 
fidelity  criterion. 

Let  W  =  {  W(*  |  s)  |  s  6  S  }  be  a  set  of  prob¬ 
ability  distributions  on  X.  An  AVS  defined  by  W  is 
a  sequence  of  random  variables  such  that 

the  distribution  of  X  =  (Xl,  . . . ,  Xn)  is  an  unknown 
element  of  Wn  . 

In  the  problem  of  hypothesis  testing,  W  is  not  ex¬ 
actly  known  to  statistician.  There  are  only  two  alter¬ 
native  hypotheses  for  W .  Let  =  {Wi{%  \  s)\s  e 


5},  2  —  1,2  be  two  sets  of  probability  distributions 
on  X.  Hi  :  W  =  W±  ,  H2  :  W  =  W2  .  When  a 
sequence  x  =  [xlt . . . ,  xn)  is  emitted  from  the  source, 
the  statistician  attempts  to  decide,  by  observing  the 
data  x  ,  which  hypothesis  of  Hi  or  H2  is  correct. 
The  decision  rule  is  characterized  by  a  set  A  C  X n  . 
The  statistician  declares  that  Hi  is  true  if  x  e  A  , 
and  that  iJ2  is  true  if  x  6  Ac  .  The  first  kind  of  error 
probability  is 

a=  max  W?(AG  I  s) 

The  second  kind  of  error  probability  is 
max  W2(A  |  t) 

Given  r  >  0  ,  we  denote 

Pn{r)  —  inf 

a<2~nr 

When  |S|  =  1  ,  it  is  the  problem  of  hypothesis  testing 
for  DMS.  for  %  =  1,  2,  denote 

^  =  {£a8W,-(.|  s)  I  0  <  Aa  <  1,  £>  =  1} 

ses 

Theorem  1 

lira  [--log Pn(r)\  =  min  min  min  D(P  II  P) 
n  Peyi)'iQewlP-D(p\\Q)<r 

here  the  right  term  is  positive. 
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Abstract  —  In  this  paper,  we  present  a  new  formula 
for  rate  of  maxima  in  the  envelope  of  a  normal  process. 
In  contrast  to  the  Rice  formula,  our  result  is  simple;  and 
also  is  not  limited  to  a  process  with  even  symmetry  in  its 
one-sided  spectrum. 


have  approximated  it  by  the  first  few  terms  of  its  2D  Hermite 
polynomials  expansion  [2]. 

Using  a  level-crossing  formula  developed  in  [2],  and  after 
admittedly  very  cumbersome  calculations,  the  above 
approximation  for  the  bivariate  pdf  yields  the  following  result: 


I.  Introduction 

In  the  early  days  of  statistical  communication  theory,  pioneering 
works  of  Rice,  Middleton,  Lawson,  Uhlenbeck,  etc.  developed 
certain  fundamental  statistical  properties  of  the  Normal  Process 
Envelope  (NPE).  But,  still  there  is  a  large  number  of  unresolved 
problems  about  the  NPE.  One  of  these  unresolved  problems  is  the 
rate  of  maxima  in  a  NPE  having  unsymmetrical  spectrum. 

ii.  Rice  Formula  for  the  Rate  of  maxima 

In  his  classic  paper  [1],  Rice  has  derived  the  following  formula  for 
a  NPE  with  even  symmetry  in  its  one-sided  spectrum: 


1  b0b4+3b22 -4fcj  ^3,1/2 

Napnr~  0  (  2  ^ 


(4) 


To  examine  the  degree  of  accuracy  of  (4),  we  considered  several 
spectra  with  various  mathematical  forms.  For  the  cases  of 
symmetrical  spectra,  N  was  computed  using  (1).  For  the  cases  of 
unsymmetrical  spectra,  however,  there  is  no  closed  form  formula 
in  [1]  and  in  the  related  literature;  so  N  was  computed  numerically 
using  a  very  complicated  triple  integral  (This  triple  integral  can  be 
derived  from  [1]).  In  all  cases,  the  relative  error  of  (4)  was  found 
to  be  below  5%!,  which  is  really  a  great  success  for  this  formula. 


(a2- \)2  b2  A/2  y  r(w/2  +  5/4)  An 
(2a) 5/2  r(n/2  +  7/4)  an 


(1)  Surprisingly,  (4)  is  exactly  like  the  same  result  that  we  have 
reported  in  [3];  which  is  obtained  by  a  completely  different 
approach. 


where: 


3-a 


h  m 

,  A  =y  ±*fL(n-m  +  l)—  (2) 
z  ml 


m= 0 


It  should  be  noted  that  (4)  is  really  an  approximate  formula  and  it 
is  not  reasonable  to  determine  its  accuracy  just  by  several 
numerical  examples.  However,  our  recent  results,  which  will  be 
reported  later,  show  that  (4)  can  be  corrected  just  by  multiplying  a 
correction  coefficient: 


In  the  above  formulas,  bn  is  the  n'th  spectral  moment: 

bn=(2it)n]W(f)(f-fc)ndf  (3) 

0 

where  w(f)  is  the  one-sided  spectrum  of  the  normal  process;  and 
fc  is  its  center  frequency. 

The  formula  (1)  is  very  complicated,  and  holds  only  under  the 
assumption  of  even  symmetry  for  w(f )  around  fc. 

III.  New  Formula  for  the  Rate  of  maxima 

One  way  for  computing  N  is  to  derive  an  analytic  expression  for 
the  bivariate  joint  pdf  of  the  two  samples  of  R'(t),  the  time 
derivative  of  the  NPE.  But  unfortunately,  it  is  very  difficult  to 
derive  this  bivariate  pdf  assuming  an  unsymmetric  w(f).  Thus,  we 
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N^KNaprx  (5) 

where  K  depends  on  the  spectral  moments.  We  observed  that  for 
the  above  examples,  K  is  very  close  to  one. 
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Abstract  —  In  search  of  a  nonparametric  indicator 
of  deterministic  signal  complexity,  we  link  the  Renyi 
entropies  to  time-frequency  representations.  The  re¬ 
sulting  measures  show  promise  in  several  situations 
where  concepts  like  the  time-bandwidth  product  fail. 

I.  Introduction 

The  term  component  is  ubiquitous  in  the  signal  processing 
literature.  Intuitively,  a  component  is  a  concentration  of  en¬ 
ergy  in  some  domain,  but  this  notion  is  difficult  to  translate 
into  a  quantitative  concept.  In  fact,  the  concept  of  a  signal 
component  has  never  been  —  and  may  never  be  —  clearly  de¬ 
fined.  In  this  paper,  rather  than  address  the  question  “what  is 
a  component?”  directly,  we  investigate  a  class  of  quantitative 
measures  of  deterministic  signal  complexity  and  information 
content.  While  they  do  not  yield  direct  answers  regarding  the 
locations  and  shapes  of  components,  these  measures  are  inti¬ 
mately  related  to  the  concept  of  a  signal  component,  the  con¬ 
nection  being  the  intuitively  reasonable  supposition  that  sig¬ 
nals  of  high  complexity  (and  therefore  high  information  con¬ 
tent)  must  be  constructed  from  large  numbers  of  elementary 
components. 

Our  approach  to  complexity  is  based  on  entropy  func¬ 
tionals  and  exploits  the  powerful  analogy  between  determin¬ 
istic  signal  energy  densities  and  probability  densities.  For 
example,  the  Wigner  time-frequency  representation  (TFR), 
Ws(t,  f)  =  f  s(u  +  j)  s*  [u  —  e~j27r7’^dr,  which  indicates 

the  joint  time-frequency  content  in  a  signal  s,  marginalizes  to 
the  time  and  frequency  energy  densities  f  Ws(t ,  /)  df  =  |s(i)|2 
and  f  Ws(t,  f)  dt  =  |5(/)|2.  The  TFRs  C.(t,  f)  of  Cohen’s 
class  form  an  infinite  set  of  generalizations  of  the  Wigner  TFR. 

The  probabilistic  analogy  evoked  by  the  marginals  suggests 
the  Shannon  entropy  H(CS)  =  -  ffCs(t,  f)  log2  Cs(t,  f)  dt  df 
as  a  natural  candidate  for  estimating  the  complexity  of  a  sig¬ 
nal  through  its  TFR:  The  peaky  TFRs  of  signals  comprised 
of  small  numbers  of  elementary  components  would  yield  small 
entropy  values,  while  the  diffuse  TFRs  of  more  complicated 
signals  would  yield  large  entropy  values.  Unfortunately,  how¬ 
ever,  the  negative  values  taken  on  by  the  Wigner  distribution 
and  most  other  Cohen’s  class  TFRs  prohibit  the  application 
of  the  Shannon  entropy  due  to  the  logarithm. 

II.  The  Renyi  Entropies 

We  propose  to  sidestep  this  negativity  issue  by  employing 
the  Renyi  entropies  [1,2]  H*{CS)  =  y^log 2Jf  C?(t,f)  dtdf, 

1  Supported  by  NSF  Grant  MIP-9457438,  ONR  Grant  N00014- 
95-1-0849,  Texas  ATP  Grant  003604-002,  and  URA  1325  CNRS. 


which  generalize  the  Shannon  entropy  to  a  family  parame¬ 
terized  by  a  >  0.  The  resulting  time-frequency  information 
measure  has  a  number  of  attractive  properties.  In  addition 
to  immunity  to  the  negative  TFR  values  that  invalidate  the 
Shannon  approach  [2],  the  third-order  Renyi  entropy  measures 
signal  complexity  [1,2]:  The  information  H3(CS)  in  the  TFR 
of  the  sum  s(£)  =  g(t)  +  g(t  —  T)  of  two  separated  signal  com¬ 
ponents  saturates  (as  T  — ►  oo)  exactly  one  bit  above  the  value 
H3(Cg)  for  a  single  component. 

Our  goal  has  been  a  detailed  study  of  the  properties  and 
applications  of  these  promising  complexity  measures,  with  em¬ 
phasis  on  establishing  a  firm  mathematical  foundation.  Inter¬ 
esting  properties  include  the  following  [2]: 

1.  For  integer  orders  a  >  1,  Ha(Cs)  is  defined  for  essen¬ 
tially  all  key  TFRs,  including  even  those  distributions 
taking  locally  negative  values. 

2.  For  odd  orders  a  >  1,  ffa(Cs)  is  asymptotically  invari¬ 
ant  to  TFR  “cross-components”  and  therefore  does  not 
count  them. 

3.  Ha(Ws)  exhibits  extreme  sensitivity  to  phase  differ¬ 
ences  between  closely  spaced  components  (ameliorated 
by  time- frequency  smoothing). 

4.  The  range  of  Ha(Ws)  values  is  bounded  above  and 
below.  A  single  Gaussian  pulse  attains  the  lower 
bound,  while  “deterministic  white  noise”  nears  the  up¬ 
per  bound. 

5.  The  value  of  Ha(Ws)  is  invariant  to  arbitrary  time  and 
frequency  shifts,  scale  changes,  and  shears  and  rotations 
in  the  time-frequency  plane. 

In  recent  work,  we  have  applied  the  Renyi  measures  to  ran¬ 
dom  signals,  introduced  the  notion  of  a  Renyi  dimension,  and 
suggested  how  these  measures  can  be  employed  to  improve 
TFR  performance  through  adaptivity. 

Finally,  we  have  introduced  a  new  “Jensen-like”  divergence 
measure  [3].  While  this  quantity  promises  to  be  a  useful  in¬ 
dicator  of  the  distance  between  two  time-frequency  distribu¬ 
tions,  it  is  currently  limited  to  the  analysis  of  positive  defi¬ 
nite  TFRs.  In  spite  of  this  rather  severe  limitation,  this  mea¬ 
sure  could  prove  useful  for  time-frequency  based  detection  and 
recognition. 
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Abstract  —  The  covariance  and  spectral  properties 
of  the  wavelet  transform  and  of  the  discrete  wavelet 
coefficients,  in  the  orthonormal  series  representation, 
of  second-order  random  fields  on  Rn  are  determined. 
Both  weakly  homogeneous  random  fields  as  well  as 
random  fields  with  weakly  homogeneous  increments 
are  considered.  Weakly  isotropic  fields  and  fields 
with  weakly  isotropic  increments  are  also  considered. 
Applications  to  fractional  Brownian  fields  on  Rn  are 
given. 

I.  Introduction 


II.  Representative  Result 

We  suppress  the  w-argument  in  the  sequel.  Consider  a  pos¬ 
sibly  complex- valued  measurable  random  field  {X(f),<  £  -ft"} 
with  weakly  homogeneous  increments  [3], 

We  are  concerned  with  the  covariance  and  spectral  prop¬ 
erties  of  the  wavelet  transform  and  approximation  and  detail 
coefficients,  as  defined  in  (1),  (2),  and  (3),  respectively,  of  the 
random  field  {X(t),t  £  Rn}  itself  (  not  of  its  increments  ). 

Theorem  1.  Assume  that  fRn  xp( t)dt  =  0.  Then  the 
wavelet  transforms  {Wa(t),t  €  Rn],  &  >  0,  are  jointly 
weakly  homogeneous  random  fields  with  zero  means  and 
covariance/cross-covariance  function 


Let  X  —  {X(t,  Lo),t  £  Rn }  be  a  possibly  complex- valued 
random  field  which  is  jointly  measurable  in  t  and  We  con¬ 
sider  second-order  random  fields  with  zero  mean  and  covari¬ 
ance  function  Cx(t,s)  =  £[X(f)X*(s)]  where  *  denotes  com¬ 
plex  conjugate.  Let  ^(<),f  €  Rn,  be  an  analyzing  wavelet. 
The  continuous  wavelet  transform  of  the  random  field  X  at 
scale  a  >  0  is  defined  by 

Wa(t,  w)  =  f  X{uj  w)^((u  —  t)/a))du  (1) 

Jr* 


so  that  {Wa(t,  w),t  £  Rn}  is  a  random  field  for  each  scale 
a  >  0. 

Let  {Vj,j  £  Z }  be  a  multiresolution  approximation  of 
L2(Rn)  and  Wj  the  orthogonal  complement  of  Vj  in  Vj+\ . 
Let  {<l>i,k{t),k  £  Zn }  be  an  orthonormal  basis  for  Vi  and 
let  {i>P,jtk(t),p  —  1,  ...,2n  —  1  >k  £  Zn }  be  an  orthonor¬ 
mal  basis  for  Wj  [2].  Define  the  approximation  coefficients 
{aLA,  —  €  Zn}  at  resolution  2~l  by 

ai,k(v)  =  [  X(t,  u))</>i,k(t)dt  (2) 

Jr * 

and  the  detail  coefficients  {bpjtk,k  £  Zn}  at  detail  level  2“J 
by 

bp,j,k(<*>)  =  (  X(tfw)i/>P)j,k(t)dt.  (3) 

Jr* 

Under  certain  integrability  conditions  (see  [1]  for  details), 
{ai,k,k  £  Zn}  and  {bpjtk,jk  £  Zn)  are  discrete-time  second- 
order  random  fields  on  Zn. 

Our  goal  is  determine  the  covariance  and  spectral  proper¬ 
ties  of  the  random  fields  {Wa(t),t  £  i2n},  {aitk,k  E  Zn }  and 
£  Zn}  and  to  see  whether  they  inherit  the  features 
of  the  input  process  X  (weakly  homogeneous,  weakly  homoge¬ 
neous  increments,  weakly  isotropic).  A  representative  result 
is  given  in  Section  II. 


Cw.itW.3(t)  =  E[Wai(t  +  u)W:2(u)] 
having  the  spectral  representation 


Cw.,  „  00  =  (aia2)n/2  /  e'-  -  ^*(oiA)  M*2\)  F(dX) 

Jr*\{o} 

-f (aia2)1+n/^2  f  (  (Au)  •  v  i>(u)ip(v)dudv  (4) 

Jr*  Jr* 


L 


fR*  JR* 

where  F(dX)  is  a  measure  on  Rn\{ 0}  satisfying 
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Lm 


(5) 


Rn\{0}  denotes  the  Euclidean  space  Rn  minus  the  vector  0, 
A  =  [a*j]  is  a  nonnegative  definite  Hermitian  matrix,  and 
4>(\)  is  the  Fourier  transform  of  xp(u). 

Remark  1.  Note  that  while  the  input  field  X  is  not  weakly 
homogeneous,  the  wavelet  transforms  at  distinct  scales  are 
jointly  weakly  homogeneous.  Their  spectral  and  cross-spectral 
distributions  can  be  obtained  from  (4).  When  the  the  first- 
order  moments  of  xp  vanish,  the  second  term  on  the  right  side 
of  (4)  is  equal  to  zero. 

Results  analogous  to  Theorem  1  are  given  for  the 
discrete-time  second-order  random  fields  {at>k,k£Zn}  and 
{bPtJtk,k  £  Zn }.  Applications  to  fractional  Brownian  fields  on 
Rn  are  also  given.  Full  details  can  be  found  in  [l]. 
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Abstract  —  In  this  paper,  a  technique  for  providing 
unequal  error  protection  is  investigated.  It  relies  on 
a  transform  approach  to  coding  and  makes  use  of  the 
wavelet  transform  over  finite  fields. 

I.  Introduction 

This  paper  deals  with  the  application  of  finite  field  wavelets  to 
build  error  correcting  codes  that  provide  unequal  error  protec¬ 
tion  to  the  codewords.  Traditional  coding  theory  usually  con¬ 
structs  codes  providing  uniform  error  protection  to  all  code¬ 
words.  But,  many  image  and  speech  processing  applications 
require  some  codewords  to  be  more  protected  than  others. 
Examples  of  this  kind  include  Differential  Pulse  Code  Mod¬ 
ulation  (DPCM),  where  the  effect  of  an  error  on  the  most 
significant  bit  (MSB)  is  much  more  than  on  the  least  signifi¬ 
cant  bit  (LSB).  Similarly,  in  Linear  Predictive  Coding  (LPC), 
a  technique  often  used  for  speech  transmission,  the  filter  co¬ 
efficients  are  much  more  important  than  the  raw  data.  One 
way  to  provide  additional  error  protection  to  some  of  the  code¬ 
words  is  to  give  all  codewords  the  highest  protection  required 
for  any  data,  but  this  is  not  bandwidth  efficient.  Additional 
error  protection  calls  for  more  redundancy,  leading  to  a  lower 
rate. 

II.  Finite  Field  Wavelet  Transforms 

A  general  theory  of  multiresolution  analysis  can  be  developed 
(cf.  [1])  over  L2(1R).  In  this  paper,  we  will  only  use  finite 
length  cyclic  wavelet  transforms  as  described  in  [2],  [3]  and 
[4],  We  will  refer  to  the  mother  wavelet  in  such  a  formulation 
by  g  and  the  2-circulant  [5]  matrix  generated  by  it  to  be  G. 
Similarly,  we  have  the  complementary  matrix  H.  (  See  [3]  for 
details.  ) 

III.  Design  of  the  Code 

Transform  domain  study  of  codewords  have  been  of  great  in¬ 
terest  [6].  To  use  wavelet  transforms  to  design  codes,  we  make 
use  of  the  fact  that  by  choosing  a  mother  wavelet  properly,  for 
a  wide  class  of  codes,  codewords  that  have  a  zero  bandpass  co¬ 
efficient  have  non-zero  lowpass  coefficients.  In  the  successive 
transform  levels,  only  a  few  codewords  still  yield  non-zero  co¬ 
efficients  and  hence  can  be  protected  more.  An  example  of 
this  kind  of  a  code  is  the  extended  Hamming  [8,4]  code  {  i.e. 
a  Hamming  (7,4)  code  with  a  parity  bit  }  with  the  mother 
wavelet  (1  —  1  00000  0).  We  will  call  this  the  Haar  wavelet 
transform  and  call  the  generated  matrix  Gh- 

IV.  Reed-Muller  Codes 

Reed-Muller  codes  can  in  general  be  represented  as  Boolean 
functions  completely  specified  by  specifying  a  set  of  basis  vec¬ 
tors.  A  first  order  Reed-Muller  code  uses  only  the  first  order 
terms,  while  an  nth  order  code  uses  product  terms  up  to  order 
n.  Of  course,  if  the  length  of  the  codewords  are  2”,  only  codes 
of  up  to  order  n  exist.  The  matrix  Gh  of  appropriate  order 
works  well  for  these  codes,  as  do  some  other  wavelets  derived 
from  a  set  of  codewords. 

1This  research  was  supported  by  the  US  Office  of  Naval  Research 
under  Grant  N00014-94-1-0115 


V.  Concatenated  Codes 

Concatenated  codes  can  be  dealt  with  in  general.  As  an  ex¬ 
ample,  let  C  be  an  (n,k)  code.  Then,  we  form  a  (2n,  2k)  code 
of  the  form  (C,  C)  by  concatenating  two  codewords,  where 
C  £  C.  It  is  easy  to  see  that  if  G  is  the  2-circulant  matrix 
formed  from  the  mother  wavelet  that  works  for  the  code  C, 
then  G  ®  I2  will  work  for  the  concatenated  code. 

As  the  next  level  of  complexity,  let  us  consider  a  code  C' 
generated  from  a  linear  code  C  in  the  following  manner  :  Let 
A  be  the  generator  matrix  for  the  code  C.  Then  the  generator 
matrix  for  the  code  C  is  given  by 

,_fAA  0 ) 

\  0  a  a  J 

Thus,  any  codeword  in  C'  is  of  the  form  (ci,  ci  +  c2,  c2),  where 
ci,c2  £  C.  With  the  assumption  that  the  code  C  is  linear,  we 
get  ci  -f  c2  £  C.  Hence,  the  matrix  G  ®  J3  works  for  this  case. 
It  is  now  obvious  how  to  deal  with  any  concatenated  code  of 
this  form.  In  fact,  depending  on  applications,  we  can  choose 
a  proper  transform  to  achieve  the  required  bit-rate. 

VI.  The  Dual  Code 

Direct  sum  of  two  different  codewords  can  be  handled  by  using 
the  fact  that  if  C  and  V  are  two  codes,  then  (C-\-T>)±  =  C±DV± 
(see  [7]).  The  idea  is  to  use  codes  that  have  as  a  subcode  a 
self-dual  code.  Then,  using  the  above  property,  if  we  can 
find  a  mother  wavelet  in  the  intersection,  a  description  can 
be  obtained.  Even-weight  repetition  codes  are  an  example  of 
such  kind  of  codes. 

VII.  Future  Directions 

It  is  thus  possible  to  characterize  a  large  class  of  codes.  The 
next  step  of  complexity  is  in  finding  descriptions  of  codes  that 
are  direct  sums  of  two  other  codes.  This  is  useful  in  finding  a 
description  for  the  Golay  code. 
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Abstract  —  Computing  the  Fast  Wavelet  Transform 
of  rational  input  sequences  using  algebraic  scaling  co¬ 
efficients  affords  only  a  finite  extension  field  I<  over  <Q 
rather  than  the  field  of  complex  numbers.  We  use  Ga¬ 
lois  theoretic  methods  to  study  this  extension  field. 

I.  Introduction 

Orthonormal  wavelet  bases  are  usually  constructed  by  the 
tools  of  multiresolution  analysis,  cf.  [2].  At  the  heart  of  a 
multiresolution  analysis  stands  a  so-called  scaling  function 
This  scaling  function  satisfies  a  dilation  equation,  which  can 
be  written  in  Fourier  space  as  <p(u;)  =  mo(u;/2)  </?(u;/2),  where 
m0(w)  =  ^hne”mu\  In  what  follows,  we  assume  compactly 
supported  scaling  functions  with  algebraic  coefficients  hn,  i.e., 
every  coefficient  hn  is  element  of  an  algebraic  number  field. 
From  the  multiresolution  analysis  axioms  one  derives  the  sim¬ 
ple  relation  |m0(a;)|2  +  |m0(u/  +  *r)  |2  =  1.  Therefore,  it  is 
convenient  to  construct  the  transfer  function  mo(w)  from  its 
squared  modulus  |  mo(w)  |2  with  the  help  of  the  following: 

Theorem  1  (Fejer-Riesz)  Let  A(w)  be  a  real  nonnegative 
even  trigonometric  polynomial 

A(u>)  —  Grn  cos  mu;,  with  am  €  IR. 

Then  it  is  possible  to  construct  a  real  trigonometric  polynomial 
B(u>)  =  £^o6me,m",  with  bm  €  IR,  of  the  same  order  M, 
such  that  A(u>)  =  \  B( tv)  |2. 


II.  Algebraic  Scaling  Coefficients 

In  the  case  of  trigonometric  polynomials  |  mo(w)  |2  with  alge¬ 
braic  coefficients,  the  following  theorem  ensures  that  mo(w) 
has  algebraic  coefficients,  too. 

Theorem  2  ([1])  The  coefficients  am  of  the  trigonometric 
polynomial  A(u;)  are  algebraic  if  and  only  if  the  coefficients 
bm  of  B{yj)  are  also  algebraic . 

Theorem  2  can  be  proved  by  extending  DAUBECHIES’  proof  of 
Theorem  1  [2],  but  using  minimal  splitting  fields  instead  of 
the  algebraically  closed  field  C.  The  main  steps  in  the  proof 
can  be  sketched  as  follows: 

1.  Rewrite  the  trigonometric  polynomial  A(u;)  as  a  poly¬ 
nomial  pA  in  cos  u).  The  polynomial  pa  can  be  factorized 
over  a  minimal  splitting  field  E  as  lc(pJ4 

Here,  lc(  ■ )  denotes  the  leading  coefficient. 

2.  Build  a  self-reciprocal  polynomial  Pa  by  substituting 
c  :=  (z  +  2-1)/2  in  pA(c)  and  multiplying  with  zM . 
Therefore,  the  resulting  polynomial  is  of  the  following 

form  Pa{v)  —  lc(P^)rin=i  (V2  ~cjz  +  V2^2)  •  ^ac_ 
torize  Pa(z)  in  a  minimal  splitting  field  D. 

1This  work  was  supported  by  DFG  under  project  Be  877/6-2. 


3.  Choose  a  zero  Zj  from  every  factor  (l/2  -  c3z  -f  1/2 z2)  , 
1  <  j  <  M,  and  build  a  new  trigonometric  polynomial 

pB(z)  =  »nUz "  where  u  e  K  is  }ust  a  nor_ 

malization  factor.  The  trigonometric  polynomial  B( w) 
is  obtained  from  Pb  by  B( u;)  =  Pb{^  tu/).  Thus,  the 
field  K  is  generated  by  elementary  symmetric  functions 
of  the  zeros  Zj. 

Hence,  from  a  field  theoretic  point  of  view  the  situation  can 
be  summarized  by  the  following  diagram: 


D 


F 


III.  Galois  Theoretic  Analysis 

From  the  very  construction,  we  see  that  the  fields  E  and  D  are 
Galois  extensions  over  F.  We  discuss  some  of  their  properties 
through  a  sequence  of  lemmas  and  corollaries. 

Lemma  1  The  Galois  group  Gal(D/E)  is  isomorphic  to 
(Z/2Z)m,  with  m  <  M. 

From  this  observation  we  easily  derive  the  following  result 
about  the  structure  of  the  Galois  group. 

Lemma  2  The  Galois  group  Gal (D/F)  is  the  extension  of 
the  elementary  abelian  normal  2-subgroup  Gal  (D/E)  by  the 
group  Gal (E/F). 

As  a  consequence,  we  get  an  upper  bound  for  the  order  of  the 
Galois  group  Gal  (Z7/F),  which  is  helpful  in  the  estimation  of 
this  group. 

Corollary  3  We  have  the  following  upper  bound  for  the  field 
degree  of  D/ F: 

[D:F]<  2M  •  |  Gal  (E/F)  \  <  Ml  ■  2M . 

By  carefully  studying  the  structure  of  K,  we  obtain 

Lemma  4  The  field  D  is  generated  by  the  composition  field 
EI< . 

Corollary  5  The  field  degree  [K  :F]  is  at  least  |  Gal (D/ E) |. 

The  close  connection  between  the  fields  D  and  K  can  be  ex¬ 
emplified  by  the  following 

Lemma  6  If  the  field  degree  D/E  is  maximal,  i.  e.f  [D  :  E]  = 
2m,  then  the  Galois  closure  of  K  is  the  field  D. 
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Abstract  —  In  this  paper  we  present  a  methodology 
to  analyze  functions  in  £2  in  terms  of  self-similar 
discrete-time  biorthogonal  functions  at  different 
resolution  levels.  We  call  these  functions  discrete-time 
biorthogonal  wavelets ,  and  they  verify  in  £2  the  same 
properties  that  biorthogonal  wavelets  do  in  L2,  in¬ 
cluding  self-similarity. 

I.  Introduction 

One  of  the  most  well-known  cases  of  Multiresolution  Analysis 
(MA)  [1]  for  functions  /  E  L2(R)  is  characterized  by  solutions 
of  the  equation 

4>(x)  =  ^^ak(t)(2x  —  k)  (1) 

k 

with  <fi(x)  €  L2,  a/c  ER,  k  EZ. 

The  family  0mfc(a;)  -  2~rn/20(2“mz  -  k)  with  k,  m  EZ, 
is  a  powerful  tool  to  analyze  the  behaviour  of  functions  at 
different  locations  and  resolutions.  As  cj>(x)  is  defined  on  L2, 
it  is  not  possible  to  apply  this  theory  directly  on  a  discrete¬ 
time  signal  g  €  £2.  For  discrete  functions,  it  is  necessary  first 
to  build  /  E  L2  from  g ,  and  then  apply  a  MA  on  /  to  study 
its  properties  and,  from  them,  extrapolate  the  properties  of  g. 
In  this  paper  we  overcome  these  drawbacks  developing  the 
theory  of  wavelets  directly  on  £2  and  giving  conditions  to  ob¬ 
tain  families  of  discrete-time  biorthogonal  wavelets. 

II.  Discrete  Multiresolution  Analysis 

We  define  a  Discrete  Multiresolution  Analysis  (DMA)  V 
by  a  set  of  closed  subspaces  Vm  C  f?2(R),  m  EN, 
...  C  V2  C  Vi  C  Vo  =  £2,  where  f] TnVrn  =  {0}, 
and  where  each  subspace  verifies  that  0m  £  Vm  exists  so 
that  the  set  of  functions  {4>mk}kez  is  a  base  for  Vm ,  with 
<j>mk[n]  =  4>m[n  —  2m/c],  n  E  Z.  A  direct  consequence  of 
this  definition  is  the  relationship  between  basis  functions  from 
adjacent  subspaces  given  by  0m+i  =  Yli t  aZV mk ,  a™  GR. 

We  introduce  the  self-similarity  criterion  among  functions 
at  different  resolutions  levels  stating  that  a  DMA  is  self-similar 
(SSDMA)  if  and  only  if 

Vm  E  N,  f[n }  €  W  f[2n]  E  Vm.  (2) 

In  [2]  we  prove  that  all  SSDMA’s  must  be  homogeneous 
(<2m  =  a°,m  >  0)  and  can  be  obtained  from  0°,  a0  E  £2 
solutions  of 

<po[n]  =  \/2^r^a£^o[2  n  —  k]. 

k 

We  will  call  that  equation  discrete  two-scale  equation  due  to 
its  analogy  with  (1).  Under  certain  conditions  of  convergence, 
an  SSDMA  leads  to  an  MA  [2].  Techniques  appearing  in  [3] 
for  increasing  regularity  of  <f>(x )  can  be  applied  to  study  the 
regularity  of  the  MA  generated  by  an  SSDMA. 


III.  Biorthogonality 

Let  V  and  V  be  two  DMA’s.  We  say  that  V  is  biorthogonal 
to  V  if  and  only  if  Vm  is  biorthogonal  to  Vm  for  m  >  0, 
that  is,  if  and  only  if  0m  is  biorthogonal  to  0m,  with  the 
scalar  product  as  the  projection  operator.  A  necessary  and 
sufficient  condition  for  Vm  being  biorthogonal  to  Vm  is  that 
01  be  biorthogonal  to  0 1,  and  that  am  be  biorthogonal  to  am. 
If  V  and  V  have  to  be  self-similar,  they  must  verify  (2).  Let 
/  E  Z2,  and  suppose  that  /  is  projected  on  V.  As  V  and  V 
are  biorthogonal,  /  can  be  reconstructed  from  V  and  from  the 
projections  of  /  on  V.  If  /  must  be  decomposed  in  terms  of 
self-similar  functions,  at  least  V  have  to  be  self-similar.  If  V 
is  not  forced  to  be  self-similar,  there  will  be  more  degrees  of 
freedom  to  design  the  families  of  biorthogonal  discrete-time 
wavelets. 

A  special  case  of  discrete-time  biorthogonal  wavelet  can 
be  obtained  when  a0  is  an  interpolation  function.  From  the 
relationship  of  this  case  with  filter  bank  theory,  one  can  obtain 
simple  filter  bank  structures  verifying  the  perfect  reconstruc¬ 
tion  property,  solving  the  drawbacks  that  interpolation  filters 
present  in  this  context  [4],  and  pointing  interesting  applica¬ 
tions  in  areas  such  as  multiresolution  image  and  video  coding. 

IV.  Generalization  of  the  Self-Similarity 
Criterion 

The  self-similarity  condition  given  in  (2)  can  be  extended  to 

f[2p+6n]  =  2-6/2g[20n},/3€  N,6  €  N+ ,  f  €  Vm+S,g  €  Vm. 

The  most  restrictive  case,  that  is,  the  case  that  would  imply 
more  constrains  on  discrete-time  wavelets  due  to  self¬ 
similarity,  corresponds  to  ft  =  0  and  leads  to  (2).  Functions 
in  SSDMA’s  with  different  /Ts  will  have  different  grade  of  self- 
similarity  (GSS).  An  expression  that  measures  this  property 
for  a  given  p  results  to  be  GSS  =  2“^.  When  relaxing  the 
self-similarity  criterion,  the  design  of  families  of  functions  is 
also  made  more  flexible.  Constrain  (2)  can  be  generalized 
for  a  integer  scaling  factors  greater  than  2.  Then,  one  can 
obtain  SSDMA’s  defined  by  discrete-time  multiwavelets  on 
which  biorthogonality  conditions  similar  to  those  given  in  the 
former  section  can  be  imposed. 
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Abstract  —  Pa  is  the  class  of  functions  with  a-th 
derivative  bounded  in  L2-norm,  a  >  0.  Kolmogorov 
and  Tichomirov  have  e-specified  any  /  E  Pa  by  a 
0(s~1/a)  bits  length  code  obtained  from  the  Fourier 
(trigonometric)  spectrum  of  /.  We  prove  that  the 
code  can  be  derived  from  /  in  linear  time.  We  show 
that  wavelets  are  equivalent  to  the  trigonometric  ba¬ 
sis  with  respect  to  both  the  length  of  the  code  and 
the  time  to  get  it  from  the  spectrum  (to  within  mul¬ 
tiplicative  constants).  On  the  other  hand,  some  bases 
of  wavelets  outperform  Fourier’s,  if  we  want  to  find 
the  value  of  /  at  some  point  given  the  code  of  /. 

I.  Introduction 

A. Kolmogorov  and  V, Tichomirov  in  collaboration  with 
V. Arnold  introduced  in  [1]  a  compact  class  Pa  of  square  in¬ 
tegrable  functions,  a  >  0.  A  2-7r-periodic  function  /,  /  £  Pa, 
belongs  to  P«,  if 

/»2tt 

/  1/(01*  <1,/  |/(a)(t)|2*  <  i , 

Jo  Jo 

where  / ^  is  the  a-th  derivative  of  /,  a  >  0.  Every  /  was 
given  a  binary  code  through  which  one  can  recover  /  with  e- 
accuracy,  e  — ►  +0,  in  I/2-norm.  The  length  of  the  code  was 
minimal  (to  within  a  multiplicative  constant)  and  equal  to  the 
e-entropy  of  Pa,  which  is  0(e— 1^a).  A  function  /  was  first  ex¬ 
panded  in  trigonometric  (Fourier)  series.  A  partial  sum  of 
the  series  is  a  polynomial  differing  from  /  by  e  in  L2-norm. 
The  set  of  the  coefficients  of  the  polynomial  is  called  the  har¬ 
monic  e-spectrum  of  /.  Kolmogorov-Tichomirov’s  code  of  / 
is  a  compressed  form  of  that  spectrum. 

With  the  minimal  code  known,  the  next  question  arises: 
how  difficult  is  it  to  go  from  /  to  its  code  and  back? 

An  orthonormal  basis  is  chosen  in  L2.  A  function  /  is 
specified  by  it’s  e-spectrum  over  that  basis.  There  are  two 
variants  of  the  above  question.  The  first:  we  want  to  know 
the  running  time  of  computer’s  transforming  the  e-spectrum 
of  /  to  a  code  of  length  0(e~1/f*)-bits,  e-specifying  /.  We  want 
to  know  also  the  running  time  of  computer’s  transforming  the 
code  back  to  the  e-spectrum.  The  second:  we  want  to  know 
the  running  time  of  computer’s  transforming  a  code  of  length 
0(e~l^a  )-bits  of  /  and  a  number  x,0  <  x  <  to  /(x).  I.e., 
what  is  the  time  required  to  compute  a  value  of  /  via  a  code 
of  /?  Our  purpose  is  to  find  out  which  basis  is  best  suited 
for  solving  that  question.  We  will  compare  the  wavelet  bases 
with  Fourier’s. 


simplex  method.  It  is  optimal  to  within  the  constants  in  O 
and  in  the  number  of  the  operations  with  bits.  The  same  is 
true  for  the  inverse  algorithm.  There  is  a  wavelet  basis  which 
is  as  good  in  solving  the  first  variant  of  the  question  as  the 
Fourier’s,  although  the  constant  in  O  is  greater  for  wavelets. 
So,  as  regards  the  spectrum-code  transformation  wavelets  are 
equivalent  to  the  trigonometric  basis  in  the  sense  mentioned. 

As  regards  the  calculation  of  functions  via  codes  of  their 
spectra  (second  variant  of  the  above  question),  wavelets  out¬ 
perform  the  trigonometric  basis.  Namely,  it  takes  either 
0(e“1/a  (log  l/e)c),  c  >  0,  or  O((logl/e)3)  operations  with 
bits  to  compute  f(x)  given  a  0(e"1/a)  code  of  /,  depending 
on  which  spectrum  the  code  is  based  on:  Fourier’s  or  wavelet’s. 

The  simplex  code  plays  an  important  part  in  our  construc¬ 
tion.  First  of  all  it  is  used  to  enumerate  vectors  with  integer 
coordinates  belonging  to  a  multidimensional  simplex.  Then 
the  code  is  applied  to  a  ball  and  to  an  ellipsoid.  Both  the 
length  and  the  running  time  of  the  simplex  code  are  minimal. 
Moreover,  one  can  recover  a  sole  coordinate  Xi,i  =  1  of 

a  vector  (xi, xp),p  >  0  rather  quickly.  This  property  of 
the  simplex  code  is  combined  with  the  fact  that  there  are  not 
so  many  wavelets  not  vanishing  at  a  point.  As  a  result,  we 
calculate  f(x)  rapidly  if  we  use  the  simplex  code  of  a  wavelet 
expansion  of  f.  The  wavelet  basis  is  selected  for  the  class  Pa- 
On  the  contrary,  in  the  trigonometric  case  we  should  use  all 
the  members  of  the  trigonometric  polynomial. 

One  of  the  open  questions:  what  is  the  tradeoff  between 
the  length  of  codes  of  functions  in  Pa  and  the  time  required 
to  compute  either  the  code  of  /  or  f(x ),  0  <  x  <  27T  given  the 
spectrum  of  /? 
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II.  Main  Results 

We  give  the  following  answers  to  those  two  variants  of  the 
question.  The  first  variant:  we  develop  a  simple  algorithm 
that  takes  an  independent  on  6  number  of  operations  with 
bits  per  an  input  bit  to  transform  the  e-spectrum  of  a  func¬ 
tion  into  its  code  of  length  O(e"1/o!)-bits.  We  call  the  algo¬ 
rithm  simplex,  not  to  be  confused  with  the  known  Dantzig’s 
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Given  a  pair  of  random  vectors  X,  Y,  we  study  the  prob¬ 
lem  of  finding  an  efficient  or  optimal  estimator  of  Y  given 
X  when  the  range  of  the  estimator  is  constrained  to  be  a  fi¬ 
nite  set  of  values.  A  generalized  vector  quantizer  (GVQ),  with 
input  dimension  k ,  output  dimension  m ,  and  size  N  maps  in¬ 
put  X  €  llk ,  to  output  V(X)  <E  The  output  V(X)  is 
constrained  to  be  one  of  the  estimation  codevectors  in  the 
codebook ,  {yi,  y2,  •  •  ■  ,y*}.  The  performance  of  the  GVQ  is 
measured  by  the  average  distortion,  D  =  E[d{ Y,  V(X))]  for 
a  suitable  output-space  distortion  measure  d(-,*).  A  GVQ  re¬ 
duces  to  a  conventional  vector  quantizer  in  the  special  case 
where  X  =  Y.  The  GVQ  problem  has  been  approached  in 
the  information  theory  literature  from  many  different  stand¬ 
points.  In  particular,  it  appears  in  the  context  of  noisy  source 
coding,  which  is  the  special  case  where  we  quantize  X,  the 
observable,  noisy  version  of  a  source,  Y. 

A  GVQ  partitions  the  input  space  Hk  into  N  decision  re¬ 
gions  or  cells .  Each  cell  is  mapped  by  the  GVQ  to  a  partic¬ 
ular  codevector.  In  principle,  a  GVQ  is  fully  characterized 
by  specifying  (a)  the  input  space  partition  and  (b)  the  code¬ 
book.  Correspondingly,  one  can  view  the  GVQ  operation  as 
the  composition  of  two  operations,  an  encoder ,  £,  which  as¬ 
signs  an  index  i  to  each  input  vector  X,  and  a  decoder ,  V , 
which  is  a  table-lookup  operation  that  generates  yt,  given  i. 
Thus,  £  is  a  classifier  whose  performance  measure  is  the  dis¬ 
tortion  in  Y  induced  by  the  classification,  and  V  is  the  condi¬ 
tional  estimator  of  Y ,  given  the  classification  index  assigned 
by  £ .  We  summarize  the  necessary  conditions  and  properties 
of  the  optimal  GVQ.  However,  the  optimal  encoder  has,  in 
general,  unmanageable  complexity  since  its  partition  regions 
may  be  neither  convex  nor  connected.  We  propose  therefore, 
to  constrain  the  complexity  of  the  encoder,  £  by  restricting 
its  structure.  Finding  the  optimal  GVQ  subject  to  the  struc¬ 
tural  constraint  is  a  hard  optimization  problem  and  to  address 
it,  we  apply  ideas  from  statistical  physics.  Although  the  ap¬ 
proach  we  propose  is  extendible  to  a  variety  of  structures,  we 
restrict  our  derivation  to  the  specific  structure  of  the  multi¬ 
ple  prototype  classifier  and  we  refer  to  such  a  GVQ  system 
as  the  multiple-prototype  generalized  vector  quantizer  (MP- 
GVQ).  In  MP-GVQ,  a  codevector,  y3  owns  M3  prototypes, 
{Xji ,  }♦  The  encoding  rule  finds  the  nearest  pro¬ 

totype  to  the  input  X  and  maps  it  to  the  estimation  vector 
associated  with  that  prototype.  Thus,  the  encoder  partition 
region  R3  is  the  union  of  M3  nearest  neighbor  Voronoi  cells. 

The  MP-GVQ  design  problem  is  to  jointly  optimize  the 
prototypes  {x^}  and  codevectors  {y_,}  to  minimize  the  dis¬ 
tortion,  D.  The  problem  cannot  be  directly  solved  with  a  vari- 

1  This  work  was  supported  in  part  by  the  National  Science  Foun¬ 
dation  under  grant  no.  NCR-9314335,  the  University  of  California 
MICRO  program,  Rockwell  International  Corporation,  Hughes  Air¬ 
craft  Company,  Echo  Speech  Corporation,  Signal  Technology  Inc., 
Lockheed  Missile  and  Space  Company  and  Qualcomm,  Inc. 


ant  of  Lloyd’s  algorithm  nor  by  a  gradient  descent  approach, 
due  to  the  discrete  nature  of  the  classifier  partition.  We  tackle 
the  problem  by  introducing  a  probabilistic  framework  for  the 
encoding  rule  where,  for  a  given  input,  a  probability  distribu¬ 
tion  is  assigned  to  the  set  of  prototypes  and  the  estimation 
vector  assigned  to  the  input  is  determined  by  the  class  index 
of  the  randomly  chosen  prototype.  The  degree  of  random¬ 
ness  is  measured  by  the  Shannon  entropy.  Randomization  of 
the  nearest-neighbor  partition  subject  to  a  constraint  on  the 
encoder  entropy  results  in  the  Gibbs  distribution  for  the  en¬ 
coding  rule.  The  Lagrange  parameter,  7  controls  the  degree 
of  randomness  ,  and  as  7  — 00  ,  the  encoding  rule  approaches 
the  (non-random)  nearest-neighbor  rule  and  the  entropy  goes 
to  zero.  Furthermore,  this  Lagrangian  framework  is  extended 
to  re-formulate  the  entire  MP-GVQ  problem  as  a  minimization 
of  the  expected  distortion,  D  subject  to  an  entropy  constraint. 
The  corresponding  Lagrange  multiplier,  j3  is  inversely  related 
to  the  temperature  in  the  physical  analogy,  as  explained  be¬ 
low. 

The  method  consists  of  starting  with  a  highly  random  en¬ 
coder  (large  value  of  the  entropy  constraint)  and  gradually 
reducing  the  entropy  while  solving  the  optimization  at  each 
level.  At  the  limit  of  zero  entropy,  we  obtain  a  deterministic 
solution  satisfying  the  structural  constraint  and  minimizing 
the  output  distortion. 

This  is  an  annealing  process  corresponding  to  the  physical 
analogy  where  a  system  whose  energy  is  the  output  distortion 
and  whose  temperature  is  inversely  related  to  the  Lagrange 
multiplier  ,  /?,  is  gradually  cooled  down  to  zero  temperature. 
This  analogy  also  explains  the  ability  of  the  method  to  avoid 
many  local  minima  that  riddle  the  distortion  surface.  The 
physical  analogy  is  taken  a  step  further  by  observing  that  the 
system  undergoes  phase  transitions  in  the  sequence  of  solu¬ 
tions  obtained  for  decreasing  values  of  entropy.  These  tran¬ 
sitions  correspond  to  an  increase  in  the  effective  size  of  the 
model  (the  number  of  distinct  codevectors  found  in  the  so¬ 
lution  for  each  entropy  value).  We  provide  a  result  yielding 
the  critical  temperature  (at  which  a  set  of  codevectors  “split” 
into  a  larger  set)  as  a  function  of  the  covariances  and  cross¬ 
covariances  of  X  and  Y  in  the  respective  clusters.  The  result 
extends  the  original  results  for  phase  transitions  of  determinis¬ 
tic  annealing  process  previously  studied  for  conventional  vec¬ 
tor  quantizer  design. 

We  demonstrate  the  usefulness  of  our  MP-GVQ  design  pro¬ 
cedure  for  a  variety  of  examples  from  the  source  coding  liter¬ 
ature. 
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Abstract  —  A  new  form  of  trellis  coded  quantization 
is  presented  based  on  uniform  quantization  thresholds 
and  “on-the-fly”  codeword  training.  The  universal 
trellis  coded  quantization  (UTCQ)  technique  requires 
no  stored  codebooks.  UTCQ  performance  is  compa¬ 
rable  with  fully  optimized  ECTCQ  for  most  rates. 
Performance  for  the  memoryless  Gaussian  source  is 
presented. 

TCQ  has  been  shown  to  be  an  effective  quantizer  for  mem¬ 
oryless  sources  with  low  to  moderate  complexity  [1].  ECTCQ 
was  developed  in  [2,  3]  and  achieves  MSE  performance  near 
(within  about  0.5  dB)  the  rate-distortion  bound  of  the  mem¬ 
oryless  Gaussian  source,  at  all  non-negative  encoding  rates. 

In  [4],  the  TCQ  subset  labelling  of  Figure  1  was  introduced.  § 
This  index  shift  makes  the  quantizer  symmetric  with  respect  § 
to  codebook  supersets  (So  =  Do  Uft  Sz  Si  —  D\  UD3).  With  g 
the  modified  labelling,  both  supersets  have  access  to  a  zero  3 
codeword.  5 

ri 

o> 


-3A  -2A  -A  0  A  2A  3A 


s0  -1  0 

Sj  -2  -1  0 


In  [4]  a  system  similar  to  UTCQ  was  presented.  There 
all  codewords  were  trained  using  a  training  sequence  and  the 
codebooks  stored.  Figure  (2)  gives  the  relative  distortion  be¬ 
tween  UTCQ  when  training  four  codewords  versus  training  all 
of  the  codewords.  By  simply  training  four  codewords,  UTCQ 
achieves  virtually  the  same  performance  and  stores  no  code¬ 
books.  Futhermore,  by  training  on  the  sequence  data  itself, 
UTCQ  may  perform  better  when  the  source  statistics  are  not 
precisely  known. 
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Fig.  1:  Modified  Subset  Labels 

The  following  relationships  are  evident  (assuming  a  sym¬ 
metric  pdf), 

(!) 

ps0[CW i]  =  Vsx  [CW-i\ .  (2) 

These  relationships  allow  the  use  of  a  single  variable-rate  code 
for  both  supersets  [5].  The  encoder  returns  the  .So  indices  and 
the  negative  of  the  Si  indices.  The  decoder  may  uniquely 
recover  the  index  stream  by  tracking  the  trellis  state. 

UTCQ  uses  uniform  thresholds  and  codewords  for  quan¬ 
tization.  The  encoder  is  completely  characterized  by  A  (see 
Figure  1).  For  CWi,  (|i|  >  2),  the  decoder  uses  uniform  code¬ 
words.  The  remaining  codewords  are  trained  on  the  actual 
sequence  being  encoded, except  CWo  =  02.  The  trained  code¬ 
words,  are  determined  by  taking  the  mean  of  all  samples  map¬ 
ping  to  i  e  So  and  the  negative  of  those  mapping  to  — *  G  Si. 
These  codewords  are  quantized  within  their  cells  using  256 
levels  and  passed  to  the  decoder.  This  quantization  requires 
a  four  byte  overhead  and  guarantees  that  the  quantized  code¬ 
words  are  within  0.4%  of  their  trained  values. 

iThis  work  was  supported  in  part  by  SAIC  and  by  the  National 
Science  Foundation  under  Grant  No.  9258374. 

2  Although  suboptimum  in  an  MSE  sense,  we  have  found  this 
sometimes  results  in  perceptual  improvements  when  used  in  image 
coding  applications. 


Fig.  2:  UTCQ  Memory  less  Gaussian  Performance 
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Abstract  —  A  method  is  proposed  for  designing  a 
maximum  mutual  information  (MMI)  vector  quan¬ 
tizer.  for  applications  in  which  quantization  is  used 
to  extract  a  set  of  discrete  features  for  use  in  classifi¬ 
cation. 


I.  Introduction 

Vector  quantization  is  commonly  used  as  a  feature  extraction 
technique  for  classification.  Typically,  the  vector  quantizer  for 
feature  extraction  is  designed  identically  to  a  vector  quantizer 
for  coding,  that  is,  to  achieve  a  minimum  distortion  represen¬ 
tation  of  the  original  data  [3].  While  this  type  of  quantizer 
has  proven  successful  as  a  feature  extraction  technique  for 
recognition  systems,  it  seems  reasonable  to  question  whether 
such  minimum  distortion  quantizers  are  actually  optimal  for 
feature  extraction. 

II.  MMI  Vector  Quantization 

We  propose  a  technique  for  designing  a  maximum  mutual  in¬ 
formation  (MMI)  quantizer  which  maximizes  the  mutual  in¬ 
formation  /((X,  C);  Q)  between  data  X  labeled  with  class  C, 
and  the  quantization  rule  Q.  We  consider  the  case  when  the 
quantization  rule  Q(X,  C)  €  {1,...,A}  is  a  function  of  the 
data  and  class  label,  as  well  as  the  case  when  Q(X)  is  a  func¬ 
tion  of  only  the  data.  The  quantization  rule  Q(X,  C)  or  Q(X) 
is  based  on  centroids  Yi,...,Y/c  associated  with  each  quanti¬ 
zation  index.  The  mutual  information  /((X,C);Q)  between 
the  data  and  the  quantizer  is  given  by  [l] 

/((X,  C);  Q)  =  H(X,  C)  -  H (X.  C\Q)  (1) 

where  H(X.  C\Q)  is  the  conditional  entropy  of  X  and  C  given 
the  quantization  Q.  Since  H(X,C)  does  not  depend  on  the 
quantization,  finding  the  quantizer  to  maximize  the  mutual 
information  between  (X,  C)  and  Q  is  equivalent  to  finding  the 
quantizer  to  minimize  H(X,C\Q). 

Now  P(X,  C|Q)  =  P(X|Q)P(C|X,  Q).  We  make  the  sim¬ 
plifying  assumption  that  P(C\X,  Q)  =  P(C\Q ).  Thus 

H(X,C\Q)  =  H{X\Q)  +  H(C\Q).  (2) 

Let  P{X\Q)  be  Gaussian,  with  mean  Yq  and  identity  covari¬ 
ance.  Then  the  quantizer  which  maximizes  the  mutual  infor¬ 
mation  between  (X.C)  and  Q  can  be  found  by  minimizing 

H(X,  C\Q)  =  |r{(X  -  Yq)2}  -  E{\og{P[C\Q))}.  (3) 

A  class-dependent  quantization  rule  Q(X,  C)  can  be  de¬ 
signed  with  an  MMI  criterion  by  using  the  standard  k-means 
algorithm  [2]  to  find  the  centroids  Yi,...,Ya'  that  minimize 
the  MMI  distortion 

dum(X,  C':  Q)  =  1(X-  Yq)2  -  log(P(C|Q)),  (4) 

averaged  over  the  labeled  training  data 
(Xi.  Ci ),...,  (Xjv,  Cn).  The  second  term  in  c?mmi(X,  C\Q) 


requires  an  estimate  of  P(C\Q),  obtained  empirically  based 
on  the  class  labels  of  the  training  data. 

In  practice,  the  class  labels  of  the  data  are  unknown  before 
quantization.  Thus  the  quantization  rule  Q(X)  must  be  a 
function  of  only  the  data  X.  We  assume  that  the  form  of  the 
quantization  rule  for  X  is  to  choose  the  quantization  index  of 
the  centroid  Y^  that  has  minimum  Euclidean  distance 

A(X;C)  =  I(X-Yq)2.  (5) 

The  quantizer  design  involves  finding  the  centroids  {Y*,}  to 
minimize  (3).  More  precisely,  since  the  expectation  is  over  the 
empirical  distribution  observed  in  some  labeled  training  data 
(Xi ,  C\ ) , . . . ,  (Xjv ,  C N ) ,  we  in  fact  seek  to  minimize 


Jv 

=  X  [LX‘  -  YQ(X.))2  -  log  -P(C;|<3(Xj)) 


(6) 


Since  the  criterion  for  estimating  the  centroids  (Eq.  6)  is 
now  different  from  the  distortion  measure  used  to  assign  vec¬ 
tors  to  centroids  (Eq.  5),  the  simple  k-means  algorithm  can’t 
be  used  for  optimizing  the  centroids.  Instead,  we  will  use  a 
gradient  descent  procedure.  Estimating  P(C\Q)  using  simply 
a  count  of  the  samples  with  quantization  index  Q  and  class  C, 
as  in  the  previous  section,  yields  a  function  which  is  piecewise- 
constant  with  respect  to  the  centroids  {Y*;},  and  thus  is  not 
amenable  to  gradient  descent.  Instead,  we  use  the  estimate 


P(C  =  m\Q  =  k) 


Eill^.mP(<?  =  *|Xt-) 
ZL-P(Q  =  *lx.) 


(7) 


where  6ct,m  is  the  Kronecker  delta:  8ct,m  =  1  if  C,  =  m,  0  if 
Cx  ^  m.  Now 


P(Q  =  k\x.)  =  -  k)P(Q  -  k) 

As  before,  P(X\Q  =  k)  is  Gaussian,  with  mean  Y*  and  iden¬ 
tity  covariance,  and  P(X t)  =  P(X{\Q  =  k)P(Q  =  k). 

We  assume  P(Q  =  k)  =  1/A'. 

Using  these  forms  for  P(Q  =  A:|Xtj  and  P(C  =  m\Q  =  k) 
in  (6),  the  gradient  of  J  with  respect  to  the  cluster  centroids 
{Y k]  can  be  computed.  A  standard  gradient  descent  pro¬ 
cedure  is  then  applied  to  minimize  J  and  hence  design  the 
codebook. 
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Abstract  —  A  robust  quantizer  is  proposed  for  trans¬ 
mission  over  a  binary  symmetric  channel  (BSC).  The 
quantization  scheme  combines  the  Gaussian  channel- 
optimized  scalar  quantizer  (COSQ)  with  an  all-pass 
filtering  before/after  quantizing.  For  a  broad  class 
of  sources  the  resulting  performance  is  approximately 
that  of  the  Gaussian  COSQ  for  the  memoryless  Gaus¬ 
sian  source. 

I.  Introduction 

There  are  several  approaches  to  designing  scalar  quantizers 
(SQs)  and  vector  quantizers  (VQs)  for  use  over  a  binary  sym¬ 
metric  channel  [l]-[8].  A  comparison  of  the  performance  of 
these  methods  leads  to  the  following  conclusion:  For  the  en¬ 
coding  of  memoryless  (generalized  Gaussian)  sources,  if  the 
channel  bit  error  rate  is  significant  (larger  than  about  10  3) 
very  little  improvement  over  channel  optimized  scalar  quanti¬ 
zation  has  been  achieved. 

Figure  1  compares  the  performance  of  COSQ  to  the 
distortion-rate  function  evaluated  at  the  channel  capacity 
(termed  the  optimal  performance  theoretically  attainable 
(OPTA))  for  Gaussian,  Laplacian,  and  generalized  Gaussian 
(with  shape  parameter  v  =  0.5)  sources.  Two  features  are 
evident.  The  first  is  that  there  is  large  potential  performance 
gain  possible  whenever  the  BSC  bit  error  rate  is  significant. 
The  second  is  that  the  general  ordering  of  the  COSQ  perfor¬ 
mance  curves  for  Gaussian,  Laplacian,  and  generalized  Gaus¬ 
sian  (i/  =  0.5)  sources  is  exactly  opposite  to  the  ordering  of 
their  respective  optimal  performance  theoretical  attainable. 


Figure  1:  OPTA  and  COSQ  performance. 


II.  Robust  Quantization 

All-pass  filtering  can  be  used  to  change  the  marginal  distri¬ 
bution  of  a  source  sequence  into  one  that  is  approximately 
Gaussian  [9] [10].  Since  the  transformation  is  unitary,  the 
proper  concatenation  of  all-pass  filtering,  quantization  using 
the  Gaussian  COSQ,  and  inverse  filtering  (at  the  receiver) 
provides  consistent  quantization  performance,  at  the  level  of 
the  Gaussian  COSQ,  for  a  wide  variety  of  source  distributions. 


Table  I  compares  the  robust  quantization  performance  to  the 
COSQ  performance  [1]  for  several  sources.  The  all-pass  fil¬ 
tering  was  done  using  the  binary  phase  scrambling  method 
[10]. 


e  =  0.001 

6  =  0.010 

e  =  0.100 

GG 

9.18 

7.23 

2.32 

Lap 

12.09 

9.18 

3.79 

Gaus 

13.99 

10.57 

4.69 

P-GG 

13.96 

10.57 

4.68 

P-Lap 

13.98 

10.56 

4.67 

Table  1:  SNR  (in  dB)  for  memoryless  Gaussian,  Lapla¬ 
cian,  and  generalized  Gaussian  (with  shape  parameter 
v  —  0.5)  sources  for  COSQ  and  the  robust  quantization 
method  (P-GG,  P-Lap). 


References 

[1]  N.  Farvardin  and  V.  Vaishampayan,  “Optimal  Quantizer  Design 
for  Noisy  Channels:  An  Approach  to  Combined  Source- Channel 
Coding,”  IEEE  Trans,  on  Information  Theory ,  vol.  IT-33,  no. 
6,  pp.  827-838,  Nov.  1987. 

[2]  N.  Farvardin  and  V.  Vaishampayan,  “On  the  performance 
and  complexity  of  channel- optimized  vector  quantizers,”  IEEE 
Trans,  on  Information  Theory ,  vol.  IT-37,  no.  1,  pp.  155-160, 
Jan. 1991. 

[3]  E.  Ayanoglu  and  R.  M.  Gray,  “The  design  of  joint  source  and 
channel  trellis  waveform  coders,”  IEEE  Trans,  on  Information 
Theory,  vol.  IT-33,  pp.  855-865,  Nov.  1987. 

[4]  M.  Wang  and  T.  R.  Fischer,  “Trellis  coded  quantization  de¬ 
signed  for  noisy  channels,”  IEEE  Trans,  on  Information  The¬ 
ory,  to  appear. 

[5]  N.  Farvardin,  “A  study  of  vector  quantization  for  noisy  chan¬ 
nels,”  IEEE  Trans,  on  Information  Theory,  vol.  IT-36,  no.  4, 
pp.  799-809,  July  1990. 

[6]  K.  Zeger  and  A.  Gersho,  “Pseudo- Gray  Coding,”  IEEE  Trans, 
on  Communications ,  vol.  COM-38,  no.  12,  pp.  2147-2158,  Dec. 
1990. 

[7]  R.  Hagen  and  P.  Hedelin,  “Robust  vector  quantization  by  a 
linear  mapping  of  a  block  code,”  submitted  to  IEEE  Trans,  on 
Information  Theory. 

[8]  P.  Knagenhjelm,  “Competitive  learning  in  robust  communica¬ 
tion,”  Ph.D  dissertation,  Chalmers  University  (Sweden),  1993 

[9]  A.  C.  Popat  and  K.  Zeger,  ”  Robust  quantization  of  memoryless 
sources  using  dispersive  FIR  filters,”  IEEE  Trans,  on  Commu¬ 
nications ,  vol.  40,  pp.  1670-1674,  Nov.  1992. 

[10]  C.  J.  Kuo  and  C.  S.  Huang,  “Robust  coding  technique- 
transform  encryption  coding  for  noisy  communications,”  Op¬ 
tical  Engineering,  Vol.  32,  No.  1,  pp.  150-156,  Jan.  1993. 


^•This  work  was  supported  by  NSF  Grant  NCR-9303868. 


435 


Soft  Decoding  for  vector  Quantization  in  combination  with  block  channel  coding 


Mikael  Skoglund  and  Per  Hedelin 

Department  of  Information  Theory,  Chalmers  University  of  Technology,  S  -  412  96  Goteborg,  Sweden 


I.  INTRODUCTION 

According  to  the  two-step  source/channel  coding  procedure 
introduced  by  Shannon,  the  source  and  the  channel  codes  are 
designed  and  used  separately.  Recent  research  has  striven  to  find 
efficient  combined  approaches  for  source/channel  coding.  Much  of 
this  research  has  considered  vector  quantization  (VQ)  for  noisy 
channels.  In  this  paper  we  present  a  method  for  joint  decoding  of 
the  combination  of  a  vector  quantizer  and  a  channel  code.  Our 
decoder  is  soft  in  the  sense  that  no  decisions  are  involved  in  the 
decoding,  and  the  unquantized  channel  outputs  are  utilized  (c.f.  [1, 
2]  and  [3]).  We  depart  from  the  traditional  way  of  decoding,  in  that 
we  make  the  decoding  into  a  one-step  procedure,  without  any 
intermediate  channel  decoding.  A  similar  approach  for  scalar 
quantization  and  a  discrete  channel  was  presented  in  [4].  We  will 
also  demonstrate  that  estimates  of  the  transmitted  binary  data  can 
be  efficiently  obtained  in  our  framework. 

II.  Block  Source  and  Channel  encoding 

Consider  a  VQ  encoder  in  tandem  with  a  block  channel  encoder. 
Assume  that  the  VQ  encoder ,  a,  maps  Rf/  onto 
3^  =  {0,1,..., N  —  1),  where  N-  2*,  and  that  the  binary 
representation,  b(i)  e  {±1}*,  of  the  chosen  index,  i  =  a(x),  for  the 
source  vector  ,  x  e  Rrf,  is  encoded  into  a  channel  codeword.  Let 
Pt  =  Pr(a(X)  =  /)).  The  channel  encoder  is  described  by  the 
mapping  where  =  {0,1, ...,M - 1),  M  =  2"  and 

n>k ,  such  that  i'  =  j5{i).  We  take  the  channel  codeword, 
c(/')€  {±1}",  to  be  the  binary  representation  of  the  index  i' .  The 
two  mappings  of  the  VQ  encoder  and  the  channel  encoder  can  be 
joined  into  one  mapping,  e :  Rd  -»  ZM,  where  e  =  p  o  a.  With  this 
mapping  we  associate  the  probabilities  Pj  =  Pr(/'  =  0,  such  that 
%  =  ^  i'eflS*),  and  /},'  =  0  if  /'e  3*  \  0(3*). 

Consequently,  the  tandem  of  the  original  VQ  encoder  and  the 
channel  encoder  is  equivalent  to  a  new  VQ  encoder ,  having 
members  of  a  subset  of  the  index  probabilities  equal  to  zero. 

III.  Optimal  Soft  decoding 

Assume  that  the  channel  corrupts  the  transmitted  codeword  with 
AWGN.  The  received  vector,  R  ~  (Rl,R2,...,Rfl)T ,  can  then  be 
expressed  as  R  =  a-c(/')  +  W,  where  a  is  a  known  amplitude  and 
W  is  Gaussian  with  covariance  matrix  cr2I.  This  model  is  valid  for 
binary  modulation  in  AWGN,  then  R  corresponds  to  samples  of 
the  matched  filter  output  at  the  receiver.  The  main  result  of  this 
paper  is  a  Hadamard-based  expression  for  the  MMSE  soft  decoder 
decoding  the  source/channel  encoder.  We  use  the  word  soft  to 
emphasize  that  the  decoder  utilizes  the  unquantized  channel  output, 
the  vector  R .  2 

The  decoder  function,  X,  that  minimizes  £||X-X||  ,  can  easily 
be  shown  to  be  X(r)  =  £[yr|R  =  r],  where  y,  =  E[X\I'  -  *] .  This 
expression  can  be  rewritten  using  a  Hadamard-transform  approach. 
For  this  purpose  we  express  the  vector  y.  as  y.  =  T  ■  hf ,  where  h.  is 
the  ith  column  of  an  M  by  M  Hadamard  matrix  H.  The  matrix  T  is 
fully  specified  by  the  vectors  y,  (c.f.  [3]).  Thus  the  MMSE- 
decoder  can  be  written  X(r)  =  T  •  fi(r)  where  h(r)  =  £[hr|R  =  r]. 
Using  this  expression  it  can  be  shown  that  optimal  soft  decoding 
can  be  based  on  estimates,  b(rm)  =  tanh (arm/G2),  of  the  individual 
bits  of  the  codeword  c(/')-  The  bit^esti mates  are  used  to  build  a 
vector  p(r) ,  according  to  p(r)  =  (1,  b{rn))T  ®  •••  ®  (1,  b{r{))T ,  where 
®  denotes  the  Kronecker  matrix  product.  It  can  then  be  shown  that 
the  expression  for  the  vector  h(r)  becomes  h(r)  =  /( r)-Rhh  -p(r), 
where  Rhh  =  £[hrhf,],  and  the  scalar  function  /( r)  is  defined  as 


/(r)  =  (m*  •  p(r)}-1,  where  mh  =  E[h7,].  By  modifying  an 
algorithm  given  in  [5]  to  the  framework  of  the  present  study,  the 
calculation  of  h(r),  based  on  the  received  vector  r,  can  be  carried 
out  using  an  order  of  n  •  2"  operations. 

Traditionally,  decoding  is  based  on  hard  bit-estimates  that  are 
calculated  from  the  received  signal.  Our  approach  performs 
decoding  in  a  single-step  procedure,  with  no  hard  decisions 
involved.  However,  for  applications  where  hard  bit-values  are 
desired,  symbol-by-symbol  MAP-estimates  of  the  bits  can  easily  be 
obtained  from  h(r).  Since  the  vector  h(r)  will  have  MMSE- 
estimates  of  the  information  bits  in  positions  2'",  m  =  0, ...,&- 1 
(assuming  that  the  channel  code  is^given  in  systematic  form),  we 
obtain  the  hard  bit-estimatesas  bMAP(m)  -  sign(4'» ),  where  h}l 
denotes  the  nth  component  of  h(r).  In  this  paper  we  investigate  the 
VQ  performance  in  terms  of  SNR,  but  we  emphasize  that 
transmission  of  binary  data  is  also  easily  treated  in  the  soft 
Hadamard-based  framework. 


IV.  RESULTS 
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Figure  I.  SNR  in  dB,  as  a  function  of  the  channel  SNR  (CSNR). 

A  simple  example  is  illustrated  in  fig.  1.  In  this  example,  a  4  bit  4- 
dimensional  VQ  trained  for  a  first  order  Gauss-Markov  source 
having  the  correlation  0.9  between  samples  is  used  in  tandem  with 
the  Hamming  (7,4)  block  channel  code.  The  simulation  shows  the 
soft  decoder  (upper  curve),  and  decoding  based  on  a  two-stage 
procedure  which  first  decodes  the  channel  code  with  soft  decision 
ML-decoding,  and  then  uses  the  decision  to  perform  table-look-up 
VQ  decoding  (lower  curve).  As  we  can  see  our  soft  decoder 
performs  better  than  the  two-stage  procedure,  and  the  difference 
becomes  larger  for  bad  channels.  This  difference  is  mainly  due  to 
the  fact  that  knowledge  of  the  CSNR  and  the  source  statistics  is 
utilized  in  the  soft  decoder.  Furthermore,  soft  decoding  is  favorable 
as  a  principle  since  ML  detection-based  decoding  destroys 
information  when  taking  hard  decisions.  This  information  can  be 
utilized  by  the  soft  decoder  to  enhance  the  performance  of  the 
decoding  of  the  VQs. 
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Abstract  —  The  trellis-based  scalar-vector  quantizer 
(TB-SVQ)  for  memoryless  sources  was  introduced  by 
Laroia  and  Farvardin  and  outperforms  all  other  rea¬ 
sonable  complexity  fixed-rate  quantizers.  Unfortu¬ 
nately,  the  resulting  code  is  catastrophic  —  a  single  bit 
error  within  a  block  can  propagate  indefinitely  into 
other  blocks.  This  paper  presents  a  new  algorithm, 
termed  a  fixed-rate  trellis  source  code  (FRTSC),  that 
achieves  essentially  the  same,  or  in  some  cases  better, 
performance  as  the  TB-SVQ  for  error-free  channels, 
but  limits  the  propagation  of  channel  errors. 

I.  Introduction 

Vector  Quantizers  can  achieve  various  gains  over  uniform 
scalar  quantizers.  These  gains  are  classified  into  boundary 
(entropy)  gain,  granular  gain  and  non-uniform  density  gain 
[l],[5].  The  scalar- vector  quantizer  (SVQ)  [4],  introduced  by 
Laroia  and  Farvardin,  is  a  structured  vector  quantizer  which 
can  achieve  both  boundary  gain  and  non-uniform  density  gain 
without  infinite  error  propagation  due  to  transmission  bit  er¬ 
rors.  The  trellis  coded  quantizers  introduced  by  Marcellin  and 
Fischer  [2]  are  effective  structured  multidimensional  quantiz¬ 
ers  that  realize  a  significant  portion  of  the  ultimate  granular 
gain.  Laroia  and  Farvardin  combined  the  SVQ  with  TCQ  to 
realize  these  three  gains.  The  resulting  quantizer  is  called  the 
trellis-  based  scalar- vector  quantizer  (TB-SVQ)  [1]. 

II.  The  Trellis-Based  Scalar- Vector  Quantizer 
(TB-SVQ) 

Laroia  and  Farvardin  impose  two  constraints  on  the  TB-SVQ 
design  so  that  the  TB-SVQ  enumeration  encoding  is  state- 
independent  and  the  SVQ  enumeration  algorithm  can  be  ap¬ 
plied  directly  in  the  TB-SVQ.  This  elegant  formulation  avoids 
the  difficulty  of  state-dependent  enumeration,  but  unfortu¬ 
nately  yields  a  catastrophic  code. 

Lemma  1.  Given  the  same  binary  SVQ  codeword,  different 
initial  states  at  the  beginning  of  a  block  can  cause  the  TB-SVQ 
decoder  to  produce  different  TB-SVQ  code-sequences  with  dif¬ 
ferent  ending  states  at  the  end  of  the  block. 

Theorem  1.  The  TB-SVQ  is  a  catastrophic  code,  whether 
or  not  a  feedback-free  encoder  is  used. 


Let  m  be  the  dimension  per  block,  r  the  bit  rate  per  di¬ 
mension,  and  /i  the  constraint  length  of  the  convolutional  en¬ 
coder.  Following  the  notation  in  [1],  let  L(s,t)  denote  the 
length  threshold  for  the  m-vectors  with  initial  state  s  and  fi¬ 
nal  state  t.  L(s,t)  are  selected  so  that  no  more  than  rm  bits 
are  used  to  encode  each  m- vector.  For  each  block,  given  initial 
state  s,  out  of  the  rm  available  bits,  fi  bits  are  used  to  specify 
the  ending  state,  and  the  remaining  rm  —  /z  bits  are  used  to 
code  the  trellis  sequences  with  initial  state  s  and  final  state  t. 

Infinite  error  propagation  due  to  channel  transmission  er¬ 
rors  is  avoided  in  the  FRTSC  because  the  ending  state  is  ex¬ 
plicitly  coded  and  transmitted.  Simulation  shows  that  the 
FRTSC  achieves  similar  performance  as  TB-SVQ  for  Gaus¬ 
sian  and  Laplacian  sources  at  encoding  rates  of  r  =  1,2,3 
bits  per  sample.  For  sharp-peaked,  broad-tailed  sources,  like 
the  generalized  Gaussian  with  shape  parameter  a  =  0.5,  some 
performance  improvement  is  achieved.  The  improvement  is  as 
large  as  0.8  dB  for  a  4-state  trellis  and  an  encoding  rate  of 
r  =  3. 

References 

[1]  R.  Laroia  and  N.  Farvardin,  “Trellis-based  scalar- vector  quan¬ 
tizer  for  memoryless  sources,”  IEEE  Trans.  Inform.  Theory, 
vol.  40,  pp.  860-870,  May  1994. 

[2]  M.  W.  Marcellin  and  T.  R.  Fischer,  “Trellis  coded  quantiza¬ 
tion  of  memoryless  and  Gauss-Markov  sources,"  IEEE  Trans. 
Commun.,  vol.  38,  pp.  82-93,  Jan.  1990. 

[3]  T.  R.  Fischer  and  J.  Pan,  “Enumeration  encoding  and  decoding 
algorithms  for  pyramid  cubic  lattice  and  trellis  coded,”  submit¬ 
ted  to  IEEE  Trans.  Inform.  Theory. 

[4]  R.  Laroia  and  N.  Farvardin,  “A  structured  fixed-rate  vector 
quantizer  derived  from  a  variable- length  scalar  quantizer  -  Part 
I:  Memoryless  sources,”  IEEE  Trans.  Inform.  Theory ,  vol.  39, 
pp.  851-867,  May  1993. 

[5]  M.  V.  Eyuboglu  and  G.  D.  Forney,  Jr.,  “Lattice  and  trellis  quan¬ 
tization  with  lattice-  and  trellis-bounded  codebooks  -  High-rate 
theory  for  memoryless  sources,”  IEEE  Trans.  Inform.  Theory, 
vol.  39,  pp.  46-59,  Jan.  1993. 


III.  A  Fixed-Rate  Trellis  Source  Code 

A  new  algorithm,  termed  a  fixed-rate  trellis  source  code 
(FRTSC),  follows  the  basic  idea  of  the  TB-SVQ  for  combining 
the  SVQ  with  TCQ,  but  differs  in  at  least  two  ways.  The  first 
is  that  no  constraints  are  imposed  on  the  SVQ  alphabet  as 
in  TB-SVQ.  This  more  general  setting  allows  a  zero  level  to 
be  included  easily  as  a  quantization  level,  and  potentially  pro¬ 
vides  performance  improvement  over  the  TB-SVQ.  The  second 
difference  is  that  a  state-dependent  enumeration  algorithm  is 
used,  which  is  a  generalization  of  the  enumeration  developed 
for  pyramid  trellis  codes  in  [3].  This  new  enumeration  explic¬ 
itly  specifies  the  ending  state  for  each  block. 

1This  work  was  supported  by  NSF  Grant  NCR-9303868 
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Abstract 

Why  do  vector  quantizers  outperform  scalar  quantizers  when 
the  source  is  stationary  and  memoryless?  This  question  is  fre¬ 
quently  asked  by  newcomers  to  VQ,  who  recognize  that,  in  this 
case,  its  ability  to  exploit  correlation  is  of  no  use.  An 
interesting  approach  (c.f.  [1])  is  to  a  compare  k-dimensional 
VQ  with  rate  R  to  the  k-dimensional  product  quantizer  (PQ) 
induced  by  applying  a  scalar  quantizer  (SQ)  with  rate  R  to  k 
successive  source  samples.  It  is  then  evident  that  one 
advantage  of  VQ  is  that  its  cells  are  more  spherical  than  those 
of  the  PQ,  which  are  rectangular.  Another  is  that  the  points  of 
the  VQ  are  better  distributed.  Indeed,  it  is  often  thought  that 
the  PQ  distributes  points  in  a  "cubic’'  fashion,  whereas  the  VQ 
matches  its  point  distribution  to  the  source;  e.g.  spherical  for  a 
Gaussian  density. 

Using  asymptotic  quantization  theory,  we  show  that  aside 
from  the  rectangularity  of  the  induced  PQ's  cells,  the  shortcom¬ 
ing  of  SQ’s  is  not  that  they  are  incapable  of  inducing  a  PQ  with 
an  optimal  point  density.  Rather,  the  structure  of  the  PQ  links 
the  point  density  and  cell  shapes  in  a  way  that  causes  the  best 
SQ  to  be  a  compromise  between  that  which  induces  the  best 
point  density  and  that  which  induces  the  best  cell  shapes. 
Consequently,  the  optimum  SQ  suffers  a  point  density  loss  and 
a  cell  shape  loss.  For  large  rates,  we  find  formulas  for  these 
and  evaluate  them  in  the  Gaussian  and  Laplacian  cases.  For 
example,  in  the  Gaussian  case,  relative  to  high-dimensional 
VQ,  an  SQ  has  a  1.88  dB  "point  density"  loss,  a  1.53  dB 
"cubic"  loss  and  a  .94  dB  "oblongitis"  loss. 


Sumary  of  Results 

Applying  an  SQ  with  Nj  points  and  point  density  ^(xj)  to 
k  successive  source  samples  induces  a  k-dimensional  PQ  with 
Nk  points.  Its  point  density  can  be  shown  to  be 

XPr(x)  =  X^xQ.. Aj(xk),  where  x  =  (x1,...,xk). 

Its  cells  are  rectangular  (cubic  on  the  diagonals),  and  the  effect 
of  their  shapes  on  the  mean  squared  error  (MSE)  is  contained  in 
the  inertial  profile  mPr,  which  equals  the  normalized  moment  of 
inertia  (nmi)  of  the  cells  in  the  vicinity  x.  It  can  be  shown  that 


pi7  >>  1 

m  ^  =  V2 


Notice  that  affects  both  Xpr  and  mpr.  In  comparison,  an 
optimal  k-dimensional  VQ  for  a  stationary  memory  less  source 
with  first-order  density  pi(X])  and  kth-order  density  p(x)  has 
point  density  [2] 

Xk(x)  =  c  p(x)k/(k+2)  =  cpi(xi)k/(k+2)...pl(xk)k/^k+2) 

where  c  is  a  constant.  Its  inertial  profile  is  mk(x)  =  Mk,  where 
Mk  is  the  least  nmi  of  k-dimensional  polytopes  that  tesselate. 

To  quantitively  assess  the  suboptimality  of  the  point  den¬ 
sity  and  inertial  profile  of  a  PQ,  consider  the  ratio  of  its  MSE, 
DPr,  to  the  MSE,  Dk,N,  of  an  optimal  k-dimensional  VQ,  which 
we  call  the  loss  L.  Using  the  vector  version  of  Bennett's 
integral  [2]  and  assuming  Nj  is  large,  we  find 


L  4 


DPr 

Dk!N 


-I 


mpr(x) 

?lpr(x)2/k 


p(x)  dx  j | 


Mk 


A,k(x) 


2/k  P(X)  dx  . 


It  is  useful  to  factor  this  loss  into  three  terms 


-pt 


^  Ecu  ^  Eq5  . 


The  point  density  loss ,  Lpt,  is  the  ratio  of  the  MSE  of  a  VQ 


with  point  density  ^pr  and  a  constant  (e.g.  optimal)  inertial 
profile  to  that  of  a  VQ  with  optimum  point  density  and  the 
same  inertial  profile.  The  cubic  loss ,  Lcu,  is  the  ratio  of  the 
MSE  of  a  VQ  with  cubic  cells  to  one  whose  cells  have  nmi  equal 
to  Mk  and  the  same  point  density.  The  oblongitis  loss ,  Lob,  is 
the  ratio  of  the  MSE  of  the  PQ  to  that  of  a  VQ  with  the  same 
point  density,  but  cubic  cells;  i.e.  it  is  due  to  rectangularity. 
The  product  of  cubic  and  oblongitis  losses  is  the  cell  shape 
loss. 

To  optimize  the  PQ,  the  scalar  point  density  Xj  must  be 
chosen  to  minimize  Lpt  Lob.  On  the  one  hand,  choosing  to 
be  uniform  minimizes  Lob.  On  the  other  hand,  choosing 
=  c'  pi(Xir~2>  minimizes  Lpt.  In  this  case,  Xpr  =  Xk,  but  there 
is  so  much  "oblongitis"  that  Lob  =  The  best  scalar  point 
density,  Xi(xj)  =  cpi(xi)1^  is  a  compromise.  It  is  more  uni¬ 
form  than  the  point  density  that  minimizes  Lpt,  which  reduces 
"oblongitis".  The  fact  that  a  PQ  can  have  the  optimal  point 
density  is  often  overlooked,  probably  due  to  the  "squarish" 
arrangement  of  its  points. 

For  the  optimal  scalar  point  density,  formulas  for  the  point 
density  and  oblongitis  losses  can  be  straightforwardly  derived. 
For  a  Gaussian  density  these  reduce  to 

„  /  3k  W  k  \ (k+2)/2  _2/3 

\k+2)  3e 

T  „rr/3k-2\k/2  J7  -1/3 

Lob  -  V3  (“Jjjr)  v3e  as  k-Aoo 

which  are  listed  in  Table  1,  along  with  Lcu,  for  various  k.  For  a 
Laplacian  density,  the  point  density  and  cubic  losses  are  the 
squares  of  those  for  the  Gaussian  density.  They  are  larger  (by  a 
factor  of  2  in  dB)  because  the  sharper  peak  and  heavier  tails 
cause  an  optimal  SQ  to  be  more  nonuniform. 

A  related  analysis  shows  that  for  a  Gaussian  source  with 
memory,  an  optimal  transform  VQ  suffers  precisely  the  same 
losses  as  in  Table  1. 
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cubic 

Lcu 

oblong's 

Lob 

pt  dens. 
Lpt 

shape 

LobLpt 

total 

LcuLobLpt 

2 

0.1671 

0.6247 

0.5115 

1.1362 

1.3033 

4 

0.3949" 

0.8020 

1.0721 

1.8741 

2.2690" 

8 

0.6572" 

0.8741 

1.4373 

2.3113 

2.9686" 

12 

0.8084" 

0.8962 

1.5744 

2.4705 

3.2789" 

24 

1.0385" 

0.9175 

1.7203 

2.6377 

3.6762* 

oo 

1.5329 

0.9380 

1.8759 

2.8139 

4.3468 
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Table  1:  Losses  (in  dB)  for  optimal  PQ’s  for  a  stationary,  mem¬ 
oryless  Gaussian  source.  The  "primed"  losses  are  based  on  a 
conjectured  lower  bound  to  Mk  [4,  pp.  61,62]. 
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Abstract  —  Vector  quantization  of  spherically  in¬ 
variant  random  processes  (SIRP)  is  considered.  Es¬ 
pecially,  trellis  coded  quantization  (TCQ)  and  lattice 
vector  quantization  (LVQ)  are  investigated.  For  per¬ 
formance  evaluations  a  random  number  generator  has 
been  developed  producing  sequences  which  can  be  re¬ 
garded  as  SIRP  realizations.  It  turns  out  that  in 
most  cases  the  TCQ  outperforms  all  other  investi¬ 
gated  quantization  methods,  even  those  LVQ  schemes 
which  are  matched  to  the  properties  of  SIRP  sources. 
Comparisons  with  bounds  from  rate  distortion  theory 
are  given  as  well. 


I.  Introduction 

Vector  quantization  (VQ)  plays  a  key  role  in  lossy  data 
compression.  The  rate  distortion  bounds  of  any  source  can  be 
reached  in  principle  by  VQ  when  the  vector  dimension  tends 
to  infinity.  Unfortunately,  with  increasing  dimension  storage 
and  computational  complexity  tend  to  infinity  as  well.  To 
cope  with  this  problem,  it  makes  sense  to  consider  methods 
which  reduce  complexity  by  employment  of  strongly  structured 
codebooks  as  there  are  lattice  vector  quantization  (LVQ)  and 
trellis  coded  quantization  (TCQ). 


II.  SIRP  Model  Source 

A  SIRP  (in  the  strict  sense)  is  a  random  process  de¬ 
fined  by  the  property  that  every  n- variate  pdf  of  random 
variables  taken  from  the  process  can  be  written  as  /(x)  = 
7T-n/2<?n(xTx).  The  pdf  is  constant  on  hyper-spheres  centered 
around  the  origin. 

A  representation  theorem  due  to  Yao  [1]  states  that  ev¬ 
ery  SIRP  can  be  regarded  as  a  variance  mixture  of  Gaussian 
processes.  For  the  density  function  of  a  SIRP  then  holds 

poo 

/SIRP(x)  =  /  fG™ss(x,r)fa(r)dr.  (1) 

Jo 


Here  /Gauss(x,  r)  denotes  the  multivariate  density  function 
of  a  Gaussian  process  with  standard  deviation  r.  f<r{r)  is  an 
univariate  density  function  called  sigma  density  which  controls 
the  distribution  of  the  variance.  The  resulting  source  itself  is 
non-ergodic,  as  most  natural  processes  (e.g.  image  and  speech 
processes)  are. 

In  this  contribution  the  sigma  density  is  modeled  in  dis¬ 
crete  fashion.  Particularly,  fa  is  modeled  with  two  Dirac  im¬ 
pulses  with  a  weight  of  0.5  at  the  locations  <7\  and  a 2 •  We 
constructed  a  random  generator,  where  the  sigma  density  was 
controlled  by  a  finite  state  machine  with  two  states.  The  state 
transition  probability  had  been  fixed  to  a  value  of  0.2.  The 
overall  variance  of  the  model  source  was  normalized  to  one 
which  is  equal  to  the  condition  an  =  y2  — <7^. 

The  univariate  pdf  of  this  particular  SIRP  is  then  given  by 


f(x) 


0.5  0.5 

-s/27 rcri  \/2i r<r2 


(2) 
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Note  that  in  the  special  case  o\  —  02  =  1  the  SIRP  is 
Gaussian  and  only  in  this  case  the  samples  of  the  process  are 
independent. 


III.  Simulation  Results 

The  parameter  <T\  of  the  SIRP  random  generator  has  been 
varied  in  the  range  from  0.3  to  1.0.  All  data  samples  obtained 
in  this  way  were  encoded  by  TCQ  at  a  rate  R  of  1,  2  and  3 
bit/sample  using  a  codebook  with  2R+1  codewords.  All  se¬ 
quences  were  also  encoded  with  a  Lloyd-Max  scalar  quantizer 
and  with  lattice  vector  quantizers. 

The  SNR  for  a  SIRP  source  with  a\  =  0.3  has  been  plotted 
in  figure  1  for  different  quantization  methods.  For  comparison, 
the  Shannon  lower  bound  for  SIRP  sources  “SLIP  (according 
to  [2])  and  the  first  order  rate-distortion  function  “RDF”  are 
plotted  as  well.  “D24”  denotes  direct  quantization  using  the 
24  dimensional  D-lattice,  and  “D24Tr”  a  LVQ  scheme  due  to 
Herbert  [3]  which  is  matched  to  SIRP  sources  employing  a 
companding  approach.  Lastly,  “Lloyd”  denotes  scalar  quanti¬ 
zation  using  a  Lloyd-Max  quantizer. 


Fig.  1:  Comparison  of  VQ  algorithms  for  SIRP  sources  with 
—  0.3 

At  rates  of  2  and  3  bit/sample  the  TCQ  with  optimized 
codebooks  outperforms  all  other  investigated  quantizers.  Only 
at  1  bit/sample  the  direct  Lattice  quantization  yields  a  slight 
improvement.  A  comparison  with  the  Shannon  lower  bounds 
again  demonstrated  the  good  performance  of  TCQ  in  the 
SIRP  case. 
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Abstract  —  Quantizer  design  algorithms  for  decen¬ 
tralized  estimation  are  presented.  Scalar  quantizer 
design  for  the  problem  of  multiple  descriptions  over 
a  multiple-access  channel  is  also  studied.  The  impor¬ 
tance  of  the  initial  index  assignment  is  explained  and 
an  algorithm  to  choose  a  good  initial  index  assignment 
is  derived. 

I.  Summary 

In  a  typical  decentralized  detection  and  estimation  sys¬ 
tem,  the  objective  is  to  estimate  a  certain  random  vari¬ 
able  at  a  fusion  center  by  using  the  observations  of  a 
set  of  sensors.  In  general,  the  observations  have  very 
large  entropy  rates,  and  therefore,  information  reduction 
at  the  sensors  before  transmission  is  necessary.  We  as¬ 
sume  that  this  information  reduction  is  accomplished  by 
scalar  quantization  and  the  quantized  values  are  trans¬ 
mitted  to  the  fusion  center  via  a  multiple-access  channel 
(MAC).  We  derive  quantizer  design  algorithms  to  min¬ 
imize  the  mean  squared  error  (MSE)  for  both  noiseless 
and  noisy  observations.  We  will  also  present  a  set  of  nu¬ 
merical  results. 

In  the  rest  of  this  summary,  we  study  the  related  prob¬ 
lem  of  multiple  descriptions  over  a  MAC. 

Consider  the  two  channel  diversity  system  where  the 
objective  is  to  transmit  a  certain  source  output  to  a  re¬ 
ceiver.  Assume  that  one  of  the  links  may  break  down 
during  the  transmission.  The  problem  is  to  send  descrip¬ 
tions  of  the  source  output  over  both  links  in  such  a  way 
that  the  overall  distortion  is  minimized  when  both  links 
are  available  and  at  the  same  time  a  minimum  fidelity  is 
guaranteed  when  one  of  the  links  is  broken.  This  setting 
is  called  the  multiple  descriptions  problem. 

In  particular,  when  the  distortion  measure  is  the  mean 
squared  error,  the  problem  is  to  minimize 

Du  =  E[(X  —  X)2|both  links  available] 
subject  to  the  constraints, 

D\  =  E[(X  —  X)2|only  Ith  link  available]  <  D\)Tnax 

where  /  =  1,2.  We  assume  that  the  transmitter  does  not 
know  whether  there  is  a  broken  link  or  not,  on  the  other 
hand,  the  receiver  does. 

El  Gamal  and  Cover  [2]  studied  this  problem  from 
an  information  theoretical  point  of  view,  and  derived  an 
achievable  rate-distortion  region  for  a  memoryless  source, 
independent  channels  and  a  single  letter  fidelity  criterion. 

1  This  work  was  partially  supported  by  the  NSF  Grant  NCR- 
9101560 


In  [3],  Ozarow  proved  that  the  region  found  in  [2]  is  ac¬ 
tually  the  rate  distortion  region  for  a  Gaussian  source 
with  mean  squared  distortion  measure.  Vaishampayan 
has  considered  the  multiple  description  scalar  quantizer 
design  problem  for  independent  noiseless  channels  [4]. 
Our  work  here  is  the  generalization  of  the  work  in  [4]  for 
the  case  of  a  noisy  and  possibly  dependent  transmission 
medium  (i.e.  a  multiple-access  channel). 

We  assume  that  the  source  statistics  and  the  channel 
characteristics  are  known,  and  derive  a  quantizer  design 
algorithm  to  minimize  the  Lagrangian  by  employing  some 
type  of  joint  source  and  channel  coding. 

It  turns  out  that,  even  for  the  case  of  independent 
noiseless  links  the  initial  index  assignment  is  very  impor¬ 
tant.  In  [4],  some  good  index  assignment  strategies  for 
independent  noiseless  links  are  presented.  In  our  setting, 
the  index  assignment  problem  is  twofold  —  one  due  to 
the  noisy  (asymmetric)  nature  of  the  links  and  the  other 
due  to  the  multiple  descriptions.  Since  the  transmission 
medium  is  not  fixed,  it  is  not  plausible  to  obtain  a  fixed  in¬ 
dex  assignment  strategy.  We  present  an  algorithm  based 
on  simulated  annealing  to  choose  a  good  initial  index  as¬ 
signment. 

In  order  to  complete  the  solution  of  the  problem,  one 
has  to  vary  the  Lagrange  multipliers,  apply  the  design  al¬ 
gorithm,  and  consider  time-sharing  of  the  resulting  strate¬ 
gies.  Therefore,  the  computational  requirements  of  the 
solution  of  the  problem  is  high,  on  the  other  hand  the 
computations  can  be  made  off-line,  and  no  on-line  com¬ 
putation  is  necessary.  It  can  also  be  shown  that  time¬ 
sharing  of  three  strategies  is  always  sufficient  to  obtain 
the  optimal  performance.  A  set  of  numerical  examples 
that  illustrates  the  use  of  the  algorithm  and  the  general 
performance  improvement  by  a  good  initial  index  assign¬ 
ment  will  also  be  presented. 
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Abstract  -  This  paper  presents  an  entropy-constrained  version 
of  the  Modified  Pairwise  Nearest  Neighbor  (MPNN)  algorithm 

[1]  for  the  design  of  efficient  vector  quantizers.  We  called  this 
new  algorithm  Entropy -Constrained  Modified  Pairwise  Nearest 
Neighbor  (ECMPNN). 

I.  Introduction 

The  proposed  ECMPNN  technique  follows  the  idea  of  the  MPNN 
algorithm  to  design  an  entropy-constrained  codebook.  MPNN  is 
derived  from  the  well  known  PNN  algorithm  [2].  The  MPNN 
clustering  starts  with  an  initial  codebook  of  the  desired  size, 
containing  vectors  from  the  training  set  (TS).  Each  vector  in  the 
initial  codebook  is  considered  a  separate  cluster.  At  each  step,  one 
new  cluster  is  formed  by  taking  a  new  TS  vector.  The  number  of 
clusters  is  maintained  to  the  desired  size  by  merging  the  two 
closest  clusters.  MPNN  maintains  a  superior  quality  of  the 
generated  codebooks,  and  it  requires  as  many  multiplications  as 
the  LBG  algorithm  [3]  needs  for  two  iterations  [1]. 

The  problem  of  entropy-constrained  vector  quantization  is  to 
choose  the  clustering  and  the  codebook  in  such  a  way  as  to 
minimize  the  overall  distortion  subject  to  an  entropy  constraint 
The  solution  proposed  by  the  CLG-ECVQ  [4]  technique  uses  the 
Lagrangian  formulation.  The  algorithm  minimizes  the  functional 
J=D+XR,  where  the  parameter  X  is  the  slope  of  the  distortion-rate 
curve.  By  varying  X,  all  the  distortion/rate  pairs  on  the  convex  hull 
of  the  operational  distortion-rate  curve  can  be  found. 

II.  Ecmpnn  Algorithm 

The  ECMPNN  clustering  begins  with  an  initial  codebook  C0  of 
size  N,  filled-up  with  N  randomly  chosen  TS  vectors 

(X0,  Xj . XN_j}.  Each  vector  in  the  initial  codebook 

corresponds  to  one  cluster.  The  initial  codebook  can  be  written  as 
C0  =  {Y0,  Yj,...,  Yn_,}  =  {X0,  Xlt...,  XN_,}  (1) 

At  each  i-th  step,  one  new  cluster  is  formed  by  a  new  vector  from 
the  TS.  The  N+l  clusters  are  converted  into  N  clusters  by  merging 
two  clusters.  The  merge  is  chosen  so  that  it  is  optimum  in  the 
distortion-rate  sense.  The  strategy  of  finding  the  best  merge  is 
described  as  follows.  Let  us  denote  by  (D ,  R  )  the 

distortion/rate  pair  corresponding  to  the  (i-l)th  step  of  ECMPNN. 
At  the  i-th  step,  we  can  consider  each  possible  merge  of  two 
clusters  and  compute  the  slope  to  any  other  distortion/rate  pair.  To 
find  the  best  merge,  it  is  sufficient  to  find  the  merge  which  yields 
the  smallest  magnitude  slope.  If  (DpRJ  is  the  distortion/rate  pair 

that  results  from  the  best  merge,  then,  any  other  merge  which 
yields  a  slope  of  larger  magnitude  will  necessarily  lie  above  the 
line  connecting  (Dj^.R^)  and  (D^RQ.  Thus,  the  merge  of 
two  clusters  must  be  taken  so  that  the  ratio  of  distortion  increment 
to  entropy  decrement  induced  by  the  merge  to  be  minimum,  that  is 
X{  =  AD;  /  ARj  =  min  (2) 

We  are  now  in  a  position  to  describe  the  proposed  algorithm. 
At  the  first  step  the  (N+l)  clusters  are  given  by 
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{{X0},  (XJ,...,  {XN„x},  {XN}}  (3) 

where  XN  is  the  new  vector  from  the  TS.  Let  us  suppose  that  the 
vectors  X0  and  X1  give  the  minimum  ratio  (2).  Then,  the 
algorithm  classifies  together  these  two  vectors  in  the  same  cluster, 
and  the  resulted  N  clusters  are  described  by 

{{X0,  XJ,  {XN},  (X2) . {XN_J}  (4) 

The  codebook  is  then  modified  by  replacing  the  first  codeword  in 
the  codebook  with  the  centroid  of  vectors  X0  and  X{ ,  and  the 
second  codeword  with  the  vector  XN  .  Note  that  the  remaining 
codewords  are  unchanged.  The  resulted  codebook  is 

Cl  =  {X01,XN,X2,...,XN_1)  (5) 

where  X01  signifies  the  mean  of  vectors  X0  and  Xx .  At  the 
second  step,  by  taking  a  new  vector  XN+1  from  the  TS,  the  (N+l) 
clusters  are  given  by 

{ {X0,  XJ,  {XN},  {X2},...,  {XN-1},  {XN+1}}  (6) 

If  we  further  suppose  that  X2  and  XN+1  are  the  closest  two 
vectors  in  the  set  (6),  then,  the  second  step  gives  N  clusters 
{{X0,  XJ,  {XN},  {X2,  XN+1},...,  {XN_J}  (7) 
and  the  resulted  codebook  is 

c2  -(x0b  XNi  X2iN+1,...,  XN_J  (S) 

where  X2  N+1  is  the  mean  of  vectors  X2  and  XN+1 .  This  process 

is  continued  until  all  the  training  vectors  have  been  considered. 
Since  the  entropy  decrement  in  (2)  depends  only  of  the  number  of 
vectors  in  the  two  considered  clusters,  its  values  can  be  computed 
and  stored  off-line.  Therefore,  ECMPNN  requires  only  one 
additional  division  per  step  compared  to  MPNN  (see  [1]  for  a 
discussion  on  MPNN’s  computational  complexity). 

HI.  Computer  Simulation  Results 

Simulations  on  a  variety  of  test  images  showed  that  the  ECMPNN 
algorithm  runs  significantly  faster  than  CLG-ECVQ,  without 
sacrificing  performance.  A  block  size  of  4  x  4  pixels  (or  vector  size 
16)  was  used,  and  the  mean  value  of  each  training  vector  was 
removed  before  the  codebook  design.  With  the  codebook  so 
obtained,  each  test  image  was  entropy  coded.  In  particular,  image 
Lenna  (from  outside  the  TS)  of  512  x  512  pixels,  256  gray  levels, 
was  coded  at  a  bit  rate  of  0.359  bpp  with  a  PSNR  of  30.46  dB. 
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Abstract  —  The  minimum  average  error  probability 
achievable  by  block  codes  on  the  two-user  multiple- 
access  channel  is  investigated.  A  new  exponential  up¬ 
per  bound  is  found  which  can  be  achieved  universally 
for  all  discrete  memoryless  multiple-access  channels 
with  given  input  and  output  alphabets.  It  is  shown 
that  the  exponent  of  this  bound  is  greater  than  or 
equal  to  those  of  previously  known  bounds.  More¬ 
over,  examples  are  given  where  the  new  exponent  is 
strictly  larger. 

Summary 

One  of  the  central  problems  in  multiuser  information  theory 
is  to  determine  the  minimum  average  error  probability  which 
can  be  achieved  on  a  two-user  discrete  memoryless  multiple- 
access  channel  using  a  block  code  with  rate  pair  (Rx,Ry)  and 
blocklength  n.  The  most  fundamental  result  of  this  theory  is 
the  coding  theorem  of  Ahlswede  [1]  and  Liao  [4]  which  asserts 
that,  for  any  (Rx ,  Ry)  in  the  interior  of  a  certain  set  C  and 
all  sufficiently  large  n,  there  exists  a  multiuser  code  with  an 
error  probability  arbitrarily  close  to  zero.  Conversely,  for  any 
{Rx,  Ry)  outside  of  C ,  the  error  probability  is  bounded  away 
from  zero.  The  set  C,  which  is  called  the  capacity  region ,  is 
the  convex  closure  of  the  set  of  rate  pairs  (Rx,  Ry)  satisfying 

0  <RX<  I(X  A  Z\Y), 

0  <Ry<  I(Y  A  Z\X),  (1) 

Rx  +Ry  <  I(XY  A  Z) 

for  some  choice  of  independent  input  random  variables  X  and 
Y,  where  Z  is  the  corresponding  channel  output. 

Over  the  past  twenty  years,  stronger  versions  of  this  cod¬ 
ing  theorem,  which  give  exponential  upper  bounds  on  the  er¬ 
ror  probability,  have  been  derived  by  Slepian  and  Wolf  [7], 
Dyachkov  [2],  Gallager  [3],  and  Pokorny  and  Wallmeier  [6]. 
Pokorny  and  Wallmeier’s  coding  theorem  is  particularly  strong 
because  it  asserts  the  existence  of  universal  multiuser  codes. 
By  this  we  mean  that  a  fixed  choice  of  codewords  and  decoding 
sets  achieves  the  upper  bound  for  all  multiple-access  channels 
with  given  input  and  output  alphabets. 

In  this  work,  we  derive  a  new  upper  bound  for  the  mini¬ 
mum  error  probability  which  can  be  achieved  on  the  multiple- 
access  channel  using  a  block  code  with  rate  pair  (Rx,Ry) 
and  blocklength  n.  Like  Pokorny  and  Wallmeier’s  result,  our 
bound  is  universally  achievable  for  all  multiple- access  channels 
with  given  input  and  output  alphabets.  The  proof  involves  a 
new  multiuser  packing  lemma  and  a  new  universal  decoding 
rule  which  seeks  to  minimize  the  empirical  equivocation  of 
the  users’  codewords  given  the  channel  output.  We  show  that 
the  exponent  of  this  bound  is  always  greater  than  or  equal  to 

1This  work  was  supported  by  the  National  Science  Foundation 
under  grant  NCR-9217457. 


those  given  in  [2,  3,  6,  7].  Moreover,  we  give  examples  in  which 
the  new  exponent  is  strictly  larger.  Hence,  the  corresponding 
bound  on  the  minimum  error  probability  is  tighter  for  large 
n. 
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Abstract  —  Pombra  and  Cover  [1]  and  Thomas  [2] 
showed  that  the  maximum  achievable  throughput 
(sum  of  rates  of  all  users)  of  a  Gaussian  multiple  access 
channel  with  feedback  is  at  most  twice  that  achievable 
without  feedback.  We  prove  a  stronger  result  which 
establishes  the  factor-of-two  bound  not  only  for  the 
total  throughput  but  for  the  entire  capacity  region  as 
well.  Specifically,  we  show  that  the  capacity  region 
of  a  Gaussian  multiple  access  channel  with  feedback 
is  contained  within  twice  the  capacity  region  without 
feedback. 


Ri 


Cfb 


Ih 


Fig.  1:  The  factor-of-two  bound  for  two  users. 


I.  Introduction 

A  channel  use  at  time  j  of  a  Gaussian  multiple  access  chan¬ 
nel  (MAC)  involves  m  independent  users  each  transmitting  a 
real  number  Xij ,  i  e  {1, . . . ,  m}.  Thus,  Xl3  denotes  the  trans¬ 
mission  of  the  user  at  time  j.  A  single  receiver  observes 
Yj  —  where  Z3  is  a  sample  from  an  arbitrary 

Gaussian  noise  process  with  known  n-block  covariance  K(^) . 
The  channel  is  assumed  to  operate  in  one  of  two  modes:  with 
or  without  feedback.  In  the  no-feedback  mode,  the  users  base 
their  transmissions  exclusively  on  the  messages  they  wish  to 
send  to  the  receiver  which  are  assumed  random  and  indepen¬ 
dent  of  each  other  and  the  noise.  With  feedback,  the  users 
can  adapt  their  transmissions  based  on  previously  received 
symbols  (the  Yj ’s)  available  to  each  user  over  a  noiseless  and 
delay  less  feedback  link.  In  both  cases  the  users’  transmissions 
must  satisfy  average  power  constraints,  n_1  J^?=1  Xfj  <  Pz. 
Since  the  same  feedback  signal  is  observed  by  all  users,  they 
can  cooperate  to  some  extent  and  achieve  higher  reliable  trans¬ 
mission  rates  than  in  the  absence  of  feedback.  Memory  in  the 
noise,  if  it  is  non-white,  can  also  be  exploited  for  additional 
gains.  Therefore,  the  capacity  region  of  the  Gaussian  MAC 
with  feedback  strictly  includes  the  capacity  region  without 
feedback.  The  gains  with  feedback  are,  however,  limited  to  a 
factor  of  two,  which  we  prove  in  the  following  theorem. 
Theorem  1  (Factor-of-two  bound)  If  (PfB, . . .  ,  PBB) 
is  an  achievable  rate  vector  for  a  Gaussian  MAC  with 
feedback  under  power  constraints  (Pi,...,Pm),  then 
(Ri  b/2, . . . ,  PBS/2)  is  an  achievable  rate  vector  with¬ 
out  feedback. 

Figure  1  illustrates  the  theorem  for  the  case  of  two  users 
(m  =  2).  The  boundary  of  the  capacity  region  of  a  two  user 
Gaussian  MAC  with  feedback  (Cfb)  lies  in  the  shaded  region 
between  the  boundary  of  the  no-feedback  capacity  region  ( C ) 
and  twice  this  boundary  (2 C). 

II.  Proof  outline 

The  proof  of  Theorem  1  relies  on  two  theorems.  The  first, 
proved  by  Keilers  [3],  gives  the  capacity  region  of  an  m  user 
Gaussian  MAC  without  feedback. 

Theorem  2  (No-feedback  theorem)  Rates  (Ri,...,Rm) 
are  achievable  for  expected  average  power  constraints 
(Pi, .  . .  ,  Pm)  if  and  only  if  for  all  e  >  0  and  all  n  sufficiently 
large ,  there  exist  n  x  n  covariance  matrices  , . . . , 


satisfying 


R(S)  < 


i,  d«(4“'  +  E, «*!?,’> 

s 108 - - - 


+  e, 


a) 


for  all  S  C  {1, . . . ,  m},  with  ^trac e(K^)  <  Pi,  for  all  i. 


The  second  theorem  is  an  extension  of  a  result  of  Pombra 
and  Cover  [1]  and  Thomas  [2]  and  provides  an  outer  bound 
on  the  capacity  region  of  the  Gaussian  MAC  with  feedback. 

Theorem  3  (Feedback  theorem)  If  (Pf B, . . . ,  RmB)  is 
an  achievable  rate  vector  with  feedback  for  expected  aver¬ 
age  power  constraints  (Pi, .  . . ,  Pm),  then  for  all  e  >  0  and 
all  n  sufficiently  large,  there  exists  a  joint  distribution  on 
( X\  , . . .  ,Xm,  Zn)  with  the  marginal  on  Zn  equal  to  the  noise 
distribution ,  and  covariance  matrices  satisfying 


R’-m  <^!og 


l  det  K( 


(») 

[Z  +  X(M)~ 


x(Mcns)} 


det  K 


(n) 


+  G 


(2) 


for  all  pairs  of  nested  subsets  M  C  S  C  with 

Ltrac e(K^)  <  Pi,  for  all  i. 


In  the  above  theorems,  R(S)  =  ^2ieS  Ri  and  X(S)  — 
X?  respectively  denote  rate  sums  and  transmission 
sums  over  subsets  of  users. 

We  outline  the  proof  of  the  factor-of-two  bound  (Theo¬ 
rem  1)  as  follows.  A  vector  of  rates  (I?fB, . . .  ,PmS)  achiev¬ 
able  with  feedback  satisfies  the  inequalities  of  the  feed¬ 
back  theorem  (Theorem  3)  for  some  covariance  structure 
for  all  n  sufficiently  large.  Using  some  combinatorial  lem¬ 
mas,  we  show  that  this  covariance  structure  also  satisfies 
the  inequalities  of  the  no- feedback  theorem  (Theorem  2)  for 
(Pf b/2,  . . . ,  Pbb/2).  This  establishes  that  the  rate  vector 
(Pf  B/ 2, . . . ,  Pbb/2)  is  achievable  without  feedback. 
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Abstract  —  We  discuss  the  achievable  e-error 
throughput  for  the  uncoordinated  (asynchronous)  T- 
user  M-frequency  multiple-access  channel  without  in¬ 
tensity  information.  The  problem  is  formulated  in 
terms  of  frequencies,  but  the  results  are  also  appli¬ 
cable  to  Pulse  Position  Modulation  (PPM)  schemes. 
We  show  that  the  achievable  sum  rate  for  T  users  re¬ 
duces  from  (M  —  1)  bits  per  channel  use  in  the  fully 
coordinated  multi-access  situation  to  (M  -  1)  •  ln(2) 
bits  per  channel  use  if  we  assume  no  coordination  be¬ 
tween  users.  In  particular,  the  result  shows  that  for 
multi-tone  M-ary  frequency  shift  keying  multiple  ac¬ 
cess  in  asynchronous  operation  for  instance,  multiple 
user  interference  reduces  the  capacity  only  by  a  factor 
ln(2)  =  0.695  relative  to  the  ideal  TDMA  system. 

I.  Summary 

Cohen,  Heller  and  Viterbi  [l]  presented  a  new  approach  to 
completely  asynchronous  multiple  access  digital  communica¬ 
tions.  In  asynchronous  multiple  access  one  assumes  that  T 
individual  users  can  access  the  system  independently  of  the 
other  users.  Each  user  transmits  by  means  of  on-off  signal¬ 
ing  without  regard  to,  or  knowledge  of  the  remaining  T  —  1 
other  users.  In  a  synchronized  Time-Division  Multiple  Ac¬ 
cess  (TDMA)  system,  each  user  would  be  assigned  1/T  of  the 
available  dimensions  and  with  on-off  signaling  the  transmis¬ 
sion  rate  can  be  1  bit/dimension,  yielding  a  capacity  per  user 
of  1/T  bits/dimension.  In  [l]  it  has  been  shown  that  in  the 
asynchronous  system,  multiple  user  interference  reduces  the 
total  capacity  for  T  users  only  by  a  factor  of  ln(2)  —  0.695 
relative  to  the  ideal  TDMA  system.  This  efficiency  can  be 
achieved  by  using  low-duty-cycle  signaling.  A  practical  exam¬ 
ple  of  such  a  signaling  is  multi-tone  (M-tone)  frequency  shift 
keying,  where  a  specific  user  transmits  one  out  of  M  orthogo¬ 
nal  frequencies.  For  PPM,  the  signaling  interval  is  partitioned 
into  M  sub-intervals  or  time  slots.  During  a  signaling  interval, 
only  one  of  the  M  sub-intervals  is  used  to  transmit  a  pulse. 

Chang  and  Wolf  [2]  considered  the  synchronous  T-user  M- 
frequency  noiseless  multiple  access  channel  where  only  one 
receiver  decodes  all  users  simultaneously.  For  a  large  number 
of  users,  the  channel  capacity  approaches  (M  —  1)  bits  per 
signaling  interval  of  M  frequencies.  The  results  were  arrived 
at  by  a  computer  search. 

We  derive  an  achievable  rate  for  the  asynchronous  T-user 
M-frequency  noiseless  multiple  access  channel  and  show  that 
the  achievable  rate  reduces  from  (M  — 1)  bits  per  channel  use  in 
the  fully  coordinated  multi-access  situation  to  (M  —  1)  ■  In (2) 
bits  per  channel  use  if  we  assume  no  coordination  between 
users,  or  one-to-one  communication.  We  start  with  2-tone 
signaling  and  show  the  nature  of  the  detection  problem.  For 
each  individual  user,  the  2-tone  multiple  access  channel  is, 
from  a  capacity  point  of  view,  equivalent  to  the  binary  input¬ 


binary  output  Z-channel.  Although  the  2-tone  multiple  access 
channel  has  a  ternary  output,  capacity  is  the  same  as  if  we 
make  a  hard  decision  in  case  of  an  ambiguous  reception  of 
two  frequencies.  The  asymptotic  optimizing  input  distribution 
is  highly  asymmetric,  indicating  that  each  of  the  users  must 
transmit  a  low  duty-cycle  signal. 

We  extend  the  system  to  M  >  3  frequencies  and  give  an 
input  distribution  from  which  it  follows  that  the  channel  ca¬ 
pacity  is  upper  bounded  by  M  •  ln(2)  and  lower  bounded  by 
(M  —  1)  •  ln(2)  bits  per  frequency  interval.  Since  the  channel 
transition  probabilities  are  functions  of  the  input  distribution, 
we  cannot  use  the  Kuhn-Tucker  conditions  for  a  candidate  (ca¬ 
pacity  achieving)  input  distribution  as  given  above.  Instead, 
we  prove  that  the  achievable  rate  C(T,  M)  for  the  chaimel 
with  M  frequencies  and  T  users,  asymptotically  approaches 
C(T,  M)  ->*  (M  —  1)  •  ln(2),  M  >  2  fixed  and  T  -A  oo. 

II.  Conclusions 

We  summarize  the  results  as  follows: 

1.  The  capacity  for  the  asynchronous  T-user  M-frequency 
noiseless  multiple  access  channel  approaches  In (2)  bits 
per  frequency  (dimension); 

2.  The  capacity  achieving  distribution  puts  all  mass  on  one 
frequency  and  divides  the  remaining  probability  mass 
equally  on  the  remaining  M  —  1  frequencies; 

3.  Instead  of  using  one  out  of  2M  —  1  combinations  of  fre¬ 
quencies  from  a  given  frequency  interval  with  M  orthog¬ 
onal  frequencies,  capacity  can  be  achieved  by  using  only 
a  single  frequency  from  an  M-frequency  interval,  which 
is  the  advantage  of  the  M-tone  frequency  shift  keying 
systems. 

4.  The  cut-off  rate  RComP  approaches  (M“l)-0.413  bit  per 
signaling  interval  for  the  same  type  of  input  probability 
distribution  as  given  in  2. 

5.  A  practical  coding  scheme  achieving  (1  —  e)  •  ln(2)  bits 
per  dimension  with  vanishing  decoding  error  probability 
is  given.  The  coding  method  is  equivalent  to  Frequency 
Hopping  MFSK  and  extends  and  modifies  the  strategy 
as  given  in  [1]. 
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Abstract  —  A  new  region  TZ  of  achievable  rate  pairs 
(R1.R2)  £  TZ  is  established  for  the  binary  multiplying 
channel.  The  new  region  TZ  has  an  equal  rate  point  of 
Rl  =  R2  =  0.63072  bit  per  transmission. 


I.  Definitions 

This  paper  is  concerned  with  the  binary  multiplying  chan¬ 
nel  (BMC)  [1].  The  capacity  region  of  the  BMC  is  bounded 
by  the  Shannon  inner  bound  region  &,  and  the  Shannon  outer 
bound  region  Go-  These  regions  are  plotted  in  Fig.  1. 

Communication  over  the  BMC  by  two  distant  terminals 
is  modeled  as  follows.  A  message  0i  at  terminal  t,  t  = 
1,2,  is  encoded  into  the  channel  input  sequence  Xt  = 
Xt,2<> Xt,n)-  The  common  channel  output  sequence 
Y\  =  Y i  =  Y  =  (Yi,  >2,  Yn)  is  formed  such  that 
Yj  =  XijX2j,  Xtj  £  {0,1},  j  =  1,2,...,  n.  Note  that  the 
first  channel  input  Xt,  i  is  based  on  the  message  Ot  only, 
while  the  fc-th  channel  input  Xt,k,  k  —  2,3,  ...,n  is  based  on 
both  the  local  message  ©*,  and  the  previous  channel  outputs 
(Yi,  >2, ....  Yfc_i).  The  decoder  at  terminal  t  estimates  the 
other  terminal’s  message  03_t  from  both  the  channel  output 
sequence  Y,  and  the  local  message  0t. 

A  coding  strategy  for  the  BMC  is  described  as  a  progressive 
subdivision  of  the  [0, 1)  x  [0, 1)  square.  Therefore,  the  proba¬ 
bility  of  each  resolution  product  that  occurs  in  this  progressive 
subdivision  of  the  unit  square  is  equal  to  its  area. 


II.  Schalkwijk’s  1983  Coding  Strategy 

The  1983  coding  strategy  is  composed  of  alternating  so- 
called  inner  and  outer  bound  transmissions.  Let  Pr  [z]  and 
Pr  [o]  denote  the  average  code  word  length  of  the  inner  and 
outer  bound  transmissions,  respectively.  Of  course,  Pr  [i]  =  1. 
Let  I  (0t;  Y|03_t,  i)  and  /  (0*;  Y|03_t,  o)  denote  the  infor¬ 
mation  rate  of  an  inner  and  an  outer  bound  transmission  from 
encoder  t  to  decoder  3  —  t,  respectively.  The  achievable  rate 
region  of  the  1983  coding  strategy  satisfies  1Z!  —  {(Pi,^)  ' 


0  <  Rt  < 


Pr  [z]  I  (0t;  Y\Q3-t,i)  +  Pr  [o]  I  (0t;  Y|Q3-t,  o) , 
Pr  [i]  +  Pr  [o] 


The  region  TZ1  has  an  equal  rate  point  of  R\  =  R2  =  0.63056 
bit  per  transmission  and  includes  the  region  Gi-  In  the  unit 
square,  a  message  pair  (0i,02)  is  always  situated  in  a  sub¬ 
rectangle  after  an  inner  bound  transmission  and  a  subsequent 
outer  bound  transmission.  Thus,  the  inner  and  outer  bound 
transmissions  can  be  repeated  ad  infinitum  in  all  these  sub¬ 
rectangles. 


III.  The  New  Coding  Strategy 

The  new  coding  strategy  consists  of  a  structure  of  inner 
bound  transmissions  of  average  code  word  length  3Pr  [z],  such 
that  (i)  an  efficient  resolution  product  is  generated,  and  (ii) 
an  unlimited  number  of  repetitions  of  this  resolution  prod¬ 
uct  is  generated.  The  subdivision  of  these  efficient  resolution 


products  is  completed  by  (i)  outer  bound  transmissions  of  av¬ 
erage  code  word  length  3  Pr  [o]  —  L[/oss],  and  (ii)  three  new 
transmissions  of  average  code  word  length  L  [gain].  In  fact, 
the  new  coding  strategy,  see  [3],  is  a  modification  of  the  1983 
coding  strategy  that  results  in  both  a  loss  and  a  gain  with 
respect  to  its  original.  Let  It  [gain]  denote  the  average  mu¬ 
tual  information  of  the  three  new  transmissions  from  encoder 
t  to  decoder  3  —  i,  then  the  achievable  rate  region  of  the  new 
strategy  satisfies  7Z  =  {(Pi,  #2)  : 

0  <  c>  <  3Pr[»]J(0t;y|03-t,O 

—  t  —  3  Pr  [z]  +  3  Pr  [o]  —  L  [loss]  +  L  [gain] 

(3  Pr  [o]  -  L  [/oss])  I  (0t;  Y|03-t,  o)  +  It  [gain] 

3  Pr  [z]  -I-  3  Pr  [0]  —  L  [loss]  H-  L  [gain]  ’ 

The  new  region  7Z  has  an  equal  rate  point  of  R\  =  R2  = 
0.63072  bit  per  transmission  and  includes  the  region  77/.  The 
results  of  van  Overveld  [4]  prove  that  all  rate  pairs  (Pi,  R2)  £ 
1Z  are  operationally  achievable.  The  new  region  7 Z  is  also 
plotted  in  Fig.  1. 
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Fig.  1:  The  new  region  77  of  achievable  rate  pairs. 
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I.  Introduction 

Consider  a  multi-user  white  noise  channel  of  bandwidth  W 
Hz,  white  noise  spectral  density  with  M  users  all  received 
at  power  P ,  all  requiring  the  same  rate  R  bits/sec.  It  is  well 
known  that  that  there  is  a  Shannon  capacity  for  the  channel 
and  that  it  is  achievable  by  FDMA.  It  is  well  known  that  the 
capacity  can  also  be  achieved  by  an  interference  cancellation 
procedure  that  involves  M 2  cancellations,  and  Rimoldi  and 
Urbanke  [3]  have  recently  shown  that  it  is  achievable  with  at 
most  2M  cancellation  steps.  In  the  present  paper  we  show 
that  we  can  achieve  rates  arbitrarily  close  to  capacity  with 
0(1)  cancellation  steps  in  the  particular  case  of  equal  pow¬ 
ers  and  equal  rates.  The  results  in  the  present  paper  first 
appeared  in  Hanly  [1]. 

II.  Interference  cancellation 

It  is  well  known  that  the  equal  rate  Shannon  capacity  can 
be  achieved  by  a  combination  of  time  sharing  and  interference 
cancellation  (Wyner  [4]).  It  can  equally  well  be  achieved  by 
a  combination  of  frequency  sharing  and  interference  cancella¬ 
tion.  In  such  a  scheme  there  are  M  subchannels  of  bandwidth 

and  all  users  send  in  all  sub-bands.  In  sub-band  I  we  might 
decode  user  1  first,  subtract  its  signal,  then  decode  user  2,  and 
so  on.  We  choose  the  orderings  of  the  users  in  the  sub-bands 
in  such  a  way  that  each  user  is  decoded  in  the  j th  position 
precisely  once  out  of  all  the  sub-bands.  This  is  really  exactly 
the  same  as  time  sharing,  except  that  we  do  not  require  time 
sv  nchroniz  ation . 

Both  time-sharing  and  frequency  sharing  cancellations  re¬ 
quire  M  cancellation  steps  in  each  sub-band,  for  a  total  of  M 2 
cancellation  steps.  In  the  present  paper  we  consider  a  scheme 
with  0(1)  cancellation  steps. 

III.  Interference  cancellation  of  groups 

Let  us  partition  the  bandwidth  into  J  subchannels,  each  of 
bandwidth  ™  and  partition  the  users  into  J  groups.  We  order 
the  groups  among  the  subchannels,  just  as  in  the  frequency 
sharing  interference  cancellation  scheme. 

Without  loss  of  generality,  assume  that  the  groups  in  sub¬ 
channel  1  are  decoded  in  the  order  Qj,  Qj-\, . . . ,  Q\ .  In  the 
first  cancellation  step  we  decode  all  the  users  in  Q j  in  paral¬ 
lel.  The  decoder  of  a  Q j  user  treats  the  interference  from  all 
other  users  in  Q  j,  as  well  as  Qj- 1 , . . . ,  Q\  (ie  all  other  users)  as 
random  noise.  The  decoded  signals  of  the  Q j  users  are  passed 
to  an  adder,  the  sum  Q j  signal  reconstituted,  and  this  is  then 
subtracted  from  the  total  received  signal.  Users  in  Qj~\  are 
then  decoded  in  parallel.  A  t/j-i  decoder  treats  the  interfer¬ 
ence  from  all  other  Qj-\  users,  and  the  users  in  Q\, . . . ,  Qj-2 
as  random  noise.  Note  that  the  interference  from  Q j  users 
has  been  subtracted  out.  This  process  continues  and  requires 
a  total  of  J  cancellation  steps.  Finally,  the  decoder  of  a  Q\ 
user  only  has  to  contend  with  interference  from  other  Q\  users. 

We  are  interested  in  the  limiting  regime  in  which  M ,  the 
number  of  users,  grows  large.  We  scale  the  bandwidth  linearly 


with  M,  W  =  WqM ,  but  the  common  received  power  P  is 
fixed.  The  number  of  groups,  J,  is  also  fixed,  so  the  number 
of  users  in  each  group  is  M/J.  Let  R be  the  bit  rate  of  a 
Qj  user  in  subchannel  1. 

Result  1  Let  a  =  — fre  fixed.  Then 


*(M)  = 


Wo  a/  J 
In  2  1  +  ja/  J 


+  o 


(f) 


bits/sec. 


and  the  common  bit  rate  R^M^  is  given  by 


r(m)  _ 


J 


*/J 


Wo 

In  2  4—/  1  -f  ja/  J 

j=i 


+  0(1/M)  bits/sec. 


The  Shannon  capacity  of  the  channel  is  independent  of  M 
and  is  given  by  C  =  Wo  log2  (l  +  ^r-)  bits/sec.  Moreover, 

Result  2  C  =  +  °(1/J)  bits/sec. 


Sketch  Proof:  We  write  C  =  ^  f*+a  j-k-  du  and  then  take 
a  Riemann  sum  approximation. 


IV.  Conclusions 

Our  interference  cancellation  scheme  involves  J 2  cancellation 
steps.  Suppose  we  wish  to  achieve  a  rate  (1  —  e)C  for  each 
user.  We  can  first  choose  a  J  sufficiently  large  so  that 

C  -  —  J  < 

In  2  1  -j-  ja/j  <  e 

j-1 

This  J  will  then  work  for  all  sufficiently  large  M,  in  the  sense 
that  for  sufficiently  large  M,  <  c.  We  conclude  that 

to  be  arbitrarily  close  to  Shannon  capacity,  we  do  not  need 
more  than  0(1)  cancellation  steps,  as  the  number  of  users 
increases. 

The  results  of  the  present  paper  are  extended  to  the  multi¬ 
receiver  radio  network  context,  and  to  the  case  of  multiple 
power  levels,  in  Hanly  [l]  and  Hanly  and  Whiting  [2].  Rimoldi 
and  Urbanke  [3]  give  a  scheme  that  can  actually  achieve  Shan¬ 
non  capacity  with  at  most  2 M  cancellation  steps,  and  this  is 
for  any  set  of  received  powers,  and  any  point  in  the  feasible 
rate  region.  We  suggest  that  the  complexity  of  their  scheme, 
at  least  for  the  equal  rate  and  equal  powers  case,  may  be 
further  reduced,  at  a  small  price  in  terms  of  bit  rate,  by  incor¬ 
porating  our  group  cancellation  approach  in  their  procedure. 
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Abstract  —  The  multiterminal  estimation  theory 
discuss  the  maximum  Fisher  information  under  the 
Shannon  information  restriction.  In  the  single¬ 
terminal  case,  it  is  trivial  problem  because  the  max¬ 
imum  Fisher  information  can  be  attained  at  asymp¬ 
totically  O-rate.  Han  and  Amari[l]  discuss  about  this 
problem  generally  and  give  the  lower  bound  of  the 
maximum  Fisher  information  under  rate  restriction. 
Its  approach  is  based  on  Slepian-Wolf  type  rate  re¬ 
gion.  In  this  paper,  we  give  an  example,  binary  sym¬ 
metric  case,  which  represents  that  sufficient  statis¬ 
tics  can  be  sent  at  the  rate  outside  of  SW-region  us¬ 
ing  Korner  and  Morton’s  method[2],  and  show  that 
it  gives  a  better  bound  than  the  one  of  Han  and 
Amari[l].  Finally  we  give  the  general  form  of  such 
parametric  family  of  which  sufficient  statistics  can  be 
sent  at  the  rate  in  KM  type  region. 


II.  Main  Result 

Theorem  1 

We  assume  that  the  parametric  family  Pxy  (0)  is  defined  in  the 
region  0  <  0  <  0'  or  1  —  O'  <  0  <  1,  where  0  <  0'  <  If  R  > 
H(6'),  9  can  be  estimated  without  loss  of  information,  that  is, 
attain  same  variance  as  when  xn  and  yn  can  be  observed. 
This  Theorem  can  be  proven  by  the  following  technic. 

•  Minimum  entropy  decoding  for  universal  coding. 

•  The  method  to  send  binary  addition  (Korner  and 
Marton[2]). 

This  Theorem  implies  that  the  sufficient  statistics  of  0  can 
be  sent  at  rate  ff(0' ).  In  the  other  hand,  Han  and  Amari[l] 
needs  rate  (1  +  H(0l))/2  to  attain  same  variance  as  when  xn 
and  yn  can  be  observed. 

Corollary  1 

By  simple  time  sharing  method,  I*(Q\R)  is  bounded  as  the 
following. 


I.  Introduction 

Let  X  and  Y  be  discrete  i.i.d.  source  which  have  a  joint 
probability  distribution  Pxy  (6),  where  xn  and  yn  are  encoded 
at  rate  R  independently.  The  encoded  messages  are  denoted 
by  un  =  f^(xn)  and  vn  =  (y”).  The  estimator  (p  estimates 

6  by  un,  Vn-  In  this  paper,  we  discuss  about  the  minimum  rate 
at  which  we  can  estimate  0  by  un,  vn  as  same  estimation  error 
as  by  xn,  vn. 

An  encoder  fn  and  an  estimator  0n  must  have  following 
property. 

•  Rate  restriction:  ^■log||/Tl||  <  R. 

•  Asymptotically  efficient:  lim  n— >oo  Ee[6n}  =  0. 

Here,  we  consider  the  variance  of  the  estimator  Vn,  and  it’s 
inverse  In. 

Vn(6 ;  UJy)  =  Ee[(k  -  9)2}  =  /  )• 

Our  aim  is  to  maximize  the  In  under  rate  restriction.  Let  I* 
be  defined  by 

I*(0;R)  =  lim  —  max  7n(0;  fx,  fy), 
oo  n  f£j£,e 

For  simpleness,  we  consider  binary  symmetric  case  that 
Pxy  is  given  by  the  following. 

p  m_(  «/2  (1  -  0)/2  \  ,  , 

Pxy(9)-  ^  (1_0)/2  Q/2  )  W 


Im{9-,R)  > 


{ 


1  R 
0(1  -6)  H(9>) 


r  <  Hie1) 

otherwise 


(2) 


This  bound  is  tighter  than  IHan  especially  when  61  is  close 
to  0  or  1,  and  R  is  close  to 

These  results  is  obtained  by  considering  alphabet  on  GF(2), 
and  we  can  easily  extend  these  results  to  GF(pfc),  where  p  is 
a  prime  number.  For  example,  on  GF(22),  we  consider  the 
following  parametric  family. 


/  0i/8  (1-00/8  02/8  (1  —  02)/8  \ 

(1-00/8  0i/8  (1  —  02 ) /8  02/8  () 

02/8  (1  —  02)/8  0i/8  (l-0i)/8  W 

\  (l-0i)/8  0i/8  (1  —  02)/8  02/8  / 

In  general,  we  consider  such  a  parametric  family  on  GF (pk) 
that, 

•  X  and  Y  has  pk  alphabets  respectively, 

•  (jp  —  l)pk~~l  parameters, 

•  if  x  +  y  =  xl  +y/  (on  GF (pk))  then  Pr{X  =  x,  Y  =  y}  = 

P  i{X  =  x',Y  =  y'}. 

Theorem  2 

About  the  parametric  families  above,  It  needs  less  rate  to  send 
sufficient  statistics  of  parameters  than  to  send  xn  ,yn . 

The  detail  and  proof  of  this  theorem  will  be  shown  in  the 
full  paper. 
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Abstract  —  We  show  that  the  coding  problem  of  any 
m-user  asynchronous  discrete  multiple  access  channel 
can  be  reduced  to  at  most  2 m  —  1  single  user  coding 
problems.  This  extends  previous  results  for  the  Gaus¬ 
sian  channel. 

I.  Introduction 

Consider  an  m-user  discrete  memoryless  channel.  This  is 
defined  in  terms  of  m  finite  input  alphabets  Xi ,  i  =  1,  •  •  • ,  m, 
an  output  alphabet  y  and  a  transition  probability  matrix 
p{y\*i,  * ,  3m).  It  is  well  known  [1]  that  the  capacity  region 

is  the  convex  hull  of  the  union  of  rate  regions  that  are  achiev¬ 
able  for  a  fixed  set  of  input  distributions,  n*:  ™  Pi(xi),  such 
that  R*  —  I{Xi£C)Y\Xie.cc),  V£  C  {1,  Hui 

and  Humblet  have  shown  [2]  that  without  the  convex  hull  op¬ 
eration,  the  remaining  region  describes  the  rate-tuples  which 
can  be  achieved  without  time  synchronization.  The  proposed 
scheme  is  for  such  asynchronous  channels. 

Although  the  theoretical  limits  of  discrete  memoryless  mul¬ 
tiple  access  channels  are  well  understood,  there  are  few  exam¬ 
ples  of  multiple  access  channels  for  which  explicit  and  efficient 
codes  are  known.  By  contrast,  significant  progress  has  been 
made  for  single  user  channels  of  practical  interest,  most  no¬ 
tably  the  Gaussian  channel  at  low  and  high  signal-to-noise 
ratios.  It  is  to  be  expected  that  the  single  user  problem  will 
always  be  better  understood  and  techniques  to  its  solution  will 
be  more  numerous  and  efficient  than  for  the  multiple  access 
problem.  The  key  contribution  of  this  paper  is  to  translate  the 
problem  of  finding  coding  schemes  for  a  given  discrete  mem¬ 
oryless  multiple  access  channel  into  the  one  of  finding  such 
schemes  for  an  appropriately  defined  single  user  channel. 

Vertices  of  the  capacity  region  can  be  achieved  by  successive 
cancellation  [1]  of  m  single  user  codes.  We  show  that  any 
point  in  an  m-dimensional  asynchronous  capacity  region  can 
be  viewed  as  a  vertex  in  an  appropriately  defined  (2m  -  1)- 
dimensional  asynchronous  capacity  region.  This  extends  the 
result  in  [3]  for  the  Gaussian  case. 

II.  The  Result  and  Proof  for  Two  User  Case 

Theorem  1  Any  rate  tuple  in  the  asynchronous  capacity  re¬ 
gion  of  a  discrete  m-user  multiple  access  channel  can  be 
achieved  by  means  of  single-user  decoding  of  at  most  2m  —  1 
users. 

Proof  for  m  —  2.  Without  loss  of  generality  [4]  assume  a 
rate-tuple  (RUR2)  such  that  R1  +  R2  =  I(XUX2]W),  Rx  < 

This  work  was  supported  in  part  by  Telecom  Australia  under 
Contract  No. 7368  and  by  the  Commonwealth  of  Australia  under 
International  S  &:  T  Grant  No. 56  as  well  as  by  National  Science 
Foundation  Grant  NCR-9357689  and  NCR-9304763. 


I(X\\Y\X2),  R2  <  I(X2]Y\Xx).  Assume  it  is  possible  to 
write  Xi  =  f(U ,  V )  for  some  function  /  and  random  variables 
U  and  V  which  are  mutually  independent  and  independent  of 
X2.  Then 

Ri  +  Ro  =  I(X1,X2;Y)  =  I{U1V,X2;Y) 

=  I(V;Y)  +  I(X2;Y\U)  +  I(V;Y\U9X2).  (1) 

If  we  can  choose  the  distribution  on  U  and  V  such  that 
R2  =  I(X2‘,  Y\U),  then  (1)  shows  that  single  user  decoding 
can  be  employed,  decoding  first  the  input  corresponding  to 
U  then  the  input  corresponding  to  X2,  and  finally  the  input 
corresponding  to  V,  i.e.,  (1)  describes  a  vertex. 

Let  U  and  V  have  the  same  alphabet  as  X\  G  {1,  •  •  ■ ,  J} 
and  let  f(u,v)  —  max{n,^}.  Let  the  distributions  on  U,  V , 
and  Xi  be  pu ,  pv,  and  pxl ,  respectively.  Define  pu(c)  = 
£PXi  +  (1  —  e)e,  e  G  [0, 1],  where  e  is  the  distribution  with  all 
its  weight  on  the  first  element.  It  can  be  verified  that  for  any 
e  G  [0,1]  a  well  defined  pv  exists  such  that  Xi  =  f(U,V ). 
Furthermore,  if  e  =  0  then  I(X2]Y\U)  =  I(X2',Y)  whereas 
if  6  —  1  then  f(X2;Y\U)  =  I(X2]Y\Xi).  Since  I(X2\Y)  < 
R2  <  I(X2',Y\Xi)  the  claim  then  follows  by  continuity.  □ 
Note  that  f(u,  v)  =  max{u,  v }  is  not  the  only  possible  func¬ 
tion,  but  this  particular  choice  leads  to  a  simple  proof. 

III.  An  Example 

Consider  the  binary  multiplier  channel,  where  the  channel  in¬ 
puts  X\  and  X2  as  well  as  the  channel  output  W  =  XiX2 
are  elements  of  {0, 1}.  The  capacity  region  of  the  binary  mul¬ 
tiplier  channel  is  well  known  [1,  p.  390]  and  is  characterized 
by  the  set  of  rate  tuples  (i?i,i22)  such  that  Rx  +  R2  <  1.  To 
achieve  the  rate  tuple  (Ri,  R2)  =  (0.5,  0.5)  we  may  choose  the 
input  distributions  to  be  pXl  =  px2  =  (1  -  l/\/2, 1/V2)  and 
let  f{u,v)  =  max{«,  n).  The  appropriate  input  distributions 
for  U  and  V  are  pu  =  (0.57,0.43)  and  pv  =  (0.51,0.49).  It 
follows  that  {Rv,Rx2,Ru)  -  (0.41,0.5,0.09)  is  a  single-user 
decodable  rate  triple  for  the  new  channel. 
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Abstract  —  The  admissible  rate-distortion  region 
is  determined  for  a  triangular  communication  system 
shown  in  Fig.  1. 

I.  Summary 

Consider  a  triangular  communication  system  (TCS)  shown 
in  Fig.  1.  where  the  source  outputs  X  and  Y  are  i.i.d.  but  mu¬ 
tually  correlated  random  variables,  which  take  values  in  finite 
sets  X  and  y.  respectively.  The  decoder’s  outputs  X  G  X  and 
Y  G  y  are  allowed  distortion,  which  is  measured  by  distortion 
measures,  dx(X,X)  <  oo  and  d.y(Y,Y)  <  oo,  respectively. 
We  consider  block  coding.  Hence,  for  XA  —  (X\,  Xo,  •  •  • ,  Xk) 
and  Ya  =  (Yi.  Y2,  •  •  ■  ,  Yk ) ,  the  encoder  /  and  the  decoders 
yx  and  gy  are  defined  as  the  following  mappings. 

(Wx,Wy)  =  f(XK,YK), 

XK  =  (XUX2,---,XK)  =  gx(Wx,V), 

YK  =  (Y, ,%,■■■,  Yk)  =  (Jy{Wy,  U), 

where  Wx  and  Wy  are  sent  to  decoders  yx  and  gy, 
respectively,  and  U  =  (U[i],U[2  and  V  — 

(V'[3].  V[2] •  •  *  •  •%])  are  codewords  to  communicate  between 
two  decoders  gx  and  gy.  and  they  are  defined  by 

r/M  =  1  =  i,2,  •••,£, 

vw  =  yW(Wy,UllhUl2]r--,U[e-1]), 

Letting  Wx  €  1(MX),  WY  €  l(My),  U[t]  €  I(Mum),  and 

V'M  €  I(Mvu] ).  where  I(M)  =  {0;  1, 2,  ■  •  • ,  M  -  1},  the  rate 
of  each  channel  is  defined  as 

Rx  =  F  log  Mx  ■  Rv  =  T  log  My, 

A  A 

L  L 

RV  =  jr  J2 log  Mvw  .  l0S  Mvl«  ■ 

t=  1  ^=1 

For  (Xa,XA)  and  (YA,YA),  each  distortion  is  mea¬ 
sured  by  the  averaged  single  letter  distortion  measure, 
i.e.  d^\X,\XK)  =  j,^{L1dx(Xk,Xk),  d^(YK,YK)  = 

hTL  dy(Yk,U 

Rate-distortion  tuple  (Rx,  Ry ,  Ru,  Rv,  Dx,  Dy)  is  called 
admissible  if  for  any  e  >  0,  sufficiently  large  R\  and  some 
finite  L,  there  exists  a  code  (/,  9x,9y,  fify  ,  =  1,  2.  •  ■  •  ,  X) 

that  satisfies 

Ed^\XK,Xh)  <  Dx+e, 

Ed{y}( YK,YK)  <  Dy  +  e. 

The  admissible  rate-distortion  region  Tv  for  the  TCS  is  defined 
as 

7v  =  {(Rx- Ry- Ru,  Rv,  Dx,  Dy)  : 

( Rx ,  Ry,  Ru,  Rv,  Dx,  Dy)  is  admissible}. 


Figure  1:  Triangular  Communication  System 

This  admissible  region  R  is  determined  by  the  following 
theorem. 

Theorem  1 

Tv  —  {  (Rx  ,  Ry  ,  Ru :  Rv  ,  Dx  ,  Dy  )  : 

Rx  >  Rx\s(Dx)  Ry  >  Ry\s(Dy) 

Rx  +  Rv  ^  Rx\s(Dx)  T-  I(XY ;  S) 

Ry  -j-  Ru  >  Ry\s(Dy)  +  I(XY ;  S) 

Ru  +  Rv  >  I(XY ;  S) 

Rx  +  Ry  >  Rx\s(Dx )  +  Ry\s(Dy)  +  I(XY;  S), 
for  some  auxiliary  random  variable  S  G  S 
such  that  |5|  <  |A  ||Y|  +  2}, 

where  Rx\s(Dx)  and  Ry\s(Dy)  ore  the  conditional  rate- 
d is tortion  fu n c t i o n s . 

Although  the  proof  of  the  converse  part  is  complicated,  the 
direct  part  can  easily  be  proved  by  using  the  code  of  the  Gray- 
Wyner  system  [1].  From  the  proof  of  the  direct  part,  the 
admissible  region  can  be  attained  with  L  =  1.  Hence,  the  it¬ 
erative  communication  between  the  decoders  is  not  necessary. 

It  is  also  worth  noticing  that  when  Rx  +  Ry  =  H(XY) 
in  the  distortionless  case,  the  minimum  of  Ru  +  Rv  is  equal 
to  the  Wyner’s  common  information  [2].  Hence,  the  Wyner’s 
common  information  can  be  explained  as  the  minimum  rate 
necessary  to  communicate  between  the  decoders  in  the  TCS 
under  the  conditions  that 

•  the  total  rate  sent  from  the  encoder  to  the  decoders  is 

minimum, 

•  X  and  Y  must  be  reproduced  with  arbitrarily  small  error 

probability. 
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Abstract  —  Extension  of  Shannon’s  inequality  for 
discrete  probability  distribution  with  an  infinite  num¬ 
ber  of  elements  is  considered.  As  an  application,  the 
asymptotic  capacity  of  T-user  binary  adder  channel  is 
exactly  determined.  Previously  known  asymptotically 
good  (but  not  very)  T-user  code  of  Chang  and  Weldon 
is  shown  to  be  far  from  asymptotically  very  good .  It  is 
thus  concluded  that  the  achievability  problem  of  the 
asymptotic  capacity  remains  open. 


I.  Extended  Shannon’s  Inequality 

Denote  the  set  of  N- D  positive  real- valued  vectors  by  R+. 
Let  X(N)  =  {(xi,X2, '  *  *  yxN)  e  R+  :  Y^k=i  Xk  ~  !}• 

Theorem  1  (I)  For  any  integer  N  £  [2,oo],  and  for  all 
p  £  X(N)  and  q  £  R+,  define  the  function  /jv(p,  q)  = 
£*=!?*  lo g(P*/9fc),  then 

fN(p,q)>  0  ifYtk^iVk  <  1,  (1) 

fw(pi q)  <  o  ^  (2) 

(II)  For  N  6  [2  >  oo),  tf  i^k  —  necessary  and  suffi¬ 

cient  condition  for  //y  (p,  q)  =  0  is 

ELi  Pk/9k  =  l,  (3) 

or  equivalently, 


Pk  =  qk  for  k  =  1,  2,  •  •  ■ ,  TV.  (4) 

Remarks:  (i)  The  base  of  the  logarithmic  function  in  the 
definition  of  /v  is  arbitrary  provided  it  is  greater  than  1.  (ii) 
The  condition  (4)  in  part  II  is  in  fact  a  classical  form  of  Shan¬ 
non’s  inequality  [X] ,  [4] .  The  extension  actually  refers  to  part  I. 
(iii)  In  part  II,  if  the  hypothesis  is  replaced  by  Ylk=i  Pk/<lk  —  1 
and  (3)  by  X^fcLi  9 k  =  1?  the  result  is  still  valid,  (iv)  (3)  is  a 
sufficient  condition  for  infinite  N,  and  we  conjecture  that  it  is 
also  a  necessary  condition,  as  it  is  for  every  finite  N. 


II.  Asymptotic  Capacity  of  T-User  Binary 
Adder  Channel 

As  an  application  of  Part  I  of  Theorem  1,  we  now  con¬ 
sider  the  asymptotic  capacity  of  T-user  binary  adder  channel. 
We  refer  the  reader  to  [2], [3]  for  the  background  of  this  topic. 
The  (sum)  capacity  is  defined  by  Csum(T)  —  —  ][A=o  ci  log2 
where  a  =  (T)/ 2r.  Wolf  [5]  observed  that  the  maximal 
achievable  rate  sum  is  about  |  log2(7reT/2)  for  such  channel. 
Chang  and  Weldon  [3]  proved  that 


1  .  7 rT  ^ 

2l0g2~  -Csl 


>(T)< 


|log2(7reT/2) 
|log2(7re(T  +  l)/2) 


even  T, 
odd  T. 


(5) 


1This  work  was  supported  by  the  Croucher  Foundation  Fellow¬ 
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They  also  conjectured  that  \  log2(7reT/2)  is  an  upper  bound 
for  odd  T.  Recently,  Blake  [2]  observed  that  1  +  |  log2(T)  is  a 
much  tighter  lower  bound,  and  that  |  log2(7reT/2),  as  an  up¬ 
per  bound,  is  very  tight  (c.f.  [2,  Table  1]).  These  observations 
motivated  our  work. 

In  [3],  a  T-user  code  is  said  to  be  asymptotically  good  if  its 
rate  sum  satisfies 


lim  jagg) 

T-.CO  A  log2  T 


lim  ^um(T) 

\  l°g2  T 


=  1, 


(6) 


where  the  last  equality  follows  from  (5).  Constructions  of 
asymptotically  good  T-user  codes  were  also  given  in  [3]. 

Our  next  theorem,  whose  proof  invokes  Part  I  of  Theorem 
1,  determines  the  exact  asymptotic  value  of  CSum(T). 

Theorem  2 


r^lo  (^sura(^)  2  l°g2  ~~2~ )  “  (?) 

In  view  of  Theorem  2,  a  T-user  code  is  said  to  be  asymp¬ 
totically  very  good  if  its  rate  sum  satisfies 

~  2  ~~2~ )  =  (8) 

The  asymptotically  good  T-user  code  of  Chang  and  Weldon 
[3]  is  not  asymptotically  very  good.  (In  fact,  the  r.h.s.  of 
(8)  is  oo  instead  of  0!)  Hence,  the  achievability  problem  of 
the  asymptotic  capacity  of  T-user  binary  adder  channel,  or 
equivalently,  the  existence  problem  of  the  asymptotically  very 
good  T-user  code  remains  open. 
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Abstract 

A  sender  communicates  with  a  receiver  who  wishes  to 
reliably  evaluate  a  function  of  their  combined  data.  We 
show  that  if  only  the  sender  can  transmit,  the  number 
of  bits  required  is  a  conditional  entropy  of  a  naturally 
defined  graph.  We  also  determine  the  number  of  bits 
needed  when  the  communicators  exchange  two  messages. 

I  Introduction 

/  is  a  function  of  two  random  variables  A  and  Y .  A 
sender  Px  knows  X ,  a  receiver  Py  knows  Y ,  and  both 
want  Py  to  reliably  determine  /(A,Y).  How  many  bits 
must  Px  transmit? 

Embedding  this  communication-complexity  scenario 
(Yao  [6])  in  the  standard  information-theoretic  setting 
(Shannon  [4]),  we  assume  that  (1)  /(A,  Y)  must  be  deter¬ 
mined  for  a  block  of  many  independent  (A,  Y)-instances, 

(2)  Px  transmits  after  observing  the  whole  block  of  X- 
instances,  (3)  a  vanishing  block  error  probability  is  al¬ 
lowed,  and  (4)  the  problem’s  rate  Lf(X\Y)  is  the  number 
of  bits  transmitted  for  the  block,  normalized  by  the  num¬ 
ber  of  instances. 

Two  naive  bounds  are  easily  established.  Lf(X\Y)  > 
H(f(X,Y)\Y),  the  number  of  bits  required  when  Px 
knows  Y  in  advance,  and  by  a  simple  application  of  the 
Slepian-Wolf  Theorem,  l/(X\Y)  <  miu{H (g(X)\Y)  : 
g{X)  and  Y  determine  f(X,Y)}.  Both  bounds  are  tight 
in  special  cases,  but  not  in  general. 

Drawing  on  rate-distortion  results,  we  show  that  for 
every  X ,  Y,  and  /, 

Lf{X\Y)  —  Hg(X\Y).  (1) 

G  is  a  simply-defined  characteristic  graph  of  A,  Y,  and 
/.  Hg(X\Y)  is  the  conditional  G-entropy  of  A  given 
Y.  It  extends  HC(X),  the  G-entropy  of  A,  defined  by 
Korner  [3],  also  called  the  graph  entropy  of  G  and  A. 
Graph  entropy  has  recently  been  used  to  derive  an  alter¬ 
native  characterization  of  perfect  graphs,  lower  bounds  on 
perfect  hashing,  lower  bounds  for  Boolean  formula  size, 
and  algorithms  for  sorting. 

The  lower  bound  (>)  in  (1)  is  proven  via  an  analogy 
between  Ha(X\Y)  and  rate-distortion  results  of  Wyner 
and  Ziv  [5]  and  their  extension  in  Csiszar  and  Korner  [1]. 
The  upper  bound  (<)  strengthens  these  rate-distortion 
results,  showing  that  in  certain  application  the  same  rate 
suffices  to  achieve  small  block-  and  not  just  bit-error  prob¬ 
ability.  The  proof  uses  robust  typicality,  a  more  restrictive 
form  of  the  asymptotic  equi-partition  property. 


We  also  consider  the  more  general  scenario  in  which  the 
communicators  can  exchange  two  messages.  Py  sends  a 
message  based  on  the  block  of  Y  instances,  and  Px  re¬ 
sponds  with  a  message  based  on  Py' s  message  and  the 
block  of  A  instances.  Again,  Py  must  accurately  eval¬ 
uate  all  f(X,Y)'s.  Px’s  transmission  rate  rx,  and  Py's 
transmission  rate  ry,  are  the  number  of  bits  they  trans¬ 
mit,  normalized  by  the  block  length.  We  determine  the 
region  Rf(X\Y)  of  possible  rate  pairs  for  all  A,  Y,  and 

/• 

Two  random  variables  U  and  V  are  admissible  if  (1) 
U-Y- A,  (2)  V-UX-Y ,  and  (2)  U,  V  and  Y  determine 
/(A,  Y).  We  show  that  for  every  (A,  Y)  and  /, 

H/(A|Y)  =  {(rs,r„)  :  rx  >  I(V;X\UY)  and 

i'y  >  I(U:Y\X)  for  some  admissible  U  and  V  j. 

The  inner  bound  is  derived  by  generalizing  the  one¬ 
way  achievability  results.  To  prove  the  (matching)  outer 
bound,  we  extend  results  of  Kaspi  and  Berger  [2]  to  a 
larger  class  of  distortion  measures. 
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Abstract  Let  the  input  to  a  computation  prob- 
lem  be  split  between  two  processors  connected  by  a 
communication  link;  and  let  an  interactive  protocol  it 
be  known,  by  which  on  any  input,  the  processors  can 
solve  the  problem  using  no  more  than  T  transmissions 
of  bits  between  them,  provided  the  channel  is  noise¬ 
less.  We  study  the  following  question:  If  in  fact  there 
is  some  noise  on  the  channel,  what  is  the  effect  upon 
the  number  of  transmissions  needed  in  order  to  solve 
the  communication  problem  reliably? 

I.  Introduction 

Shannon,  in  his  seminal  study  of  communication  [3],  studied 
the  effect  of  noise  upon  “one-way”  communication  problems, 
i.e.  data  transmission.  His  fundamental  observation  was  that 
coding  schemes  which  did  not  treat  each  bit  separately,  but 
jointly  encoded  large  blocks  of  data  into  long  codewords,  could 
achieve  very  small  error  probability  (exponentially  small  in 
T),  while  slowing  down  by  only  a  constant  factor  relative  to 
the  T  transmissions  required  by  the  noiseless-channel  protocol 
(which  can  simply  send  the  bits  one  by  one).  The  constant 
(ratio  of  noiseless  to  noisy  communication  time)  is  a  property 
of  the  channel,  known  as  its  Shannon  capacity. 

The  improvement  in  communication  rate  provided  by  Shan¬ 
non’s  insight  is  dramatic:  if  the  channel  is  memoryless,  the 
naive  protocol  which  repeates  each  bit  several  times  can  only 
achieve  the  same  error  probability  by  repeating  each  bit  a 
number  of  times  proportional  to  the  length  of  the  entire  orig¬ 
inal  protocol.  (For  a  total  of  T2  communications.)  Moreover 
m  order  to  achieve  any  communication  on  “adversarial”  or 
worst-case  channels  in  which  any  set  of  a  given  number  of 
transmissions  may  be  garbled,  such  error- correcting  codes  are 
necessary.  A  precise  statement  of  Shannon’s  coding  theorem 
(for  the  special  case  of  binary  symmetric  channels,  BSCs)  fol¬ 
lows.  With  some  loss  in  the  capacity,  a  similar  statement  can 
be  made  for  “adversarial”  channels. 

Theorem  1  (Shannon)  Let  a  BSC  of  capacity  C  be  given . 
For  every  T  and  every  7  >  0  there  exists  a  code  x  :  {0,  1}T 
{0,  1}tcB+7)  an(j  a  ecodi ng  map  \  :  {0,  l}Tc(1+^) 

{0,  1}  such  that  every  codeword  transmitted  across  the  chan¬ 
nel  is  decoded  correctly  with  probability  1  —  e-n(D  _ 

Recently,  in  computer  science,  communication  has  come 
to  be  critical  to  distributed  computing,  parallel  computing, 
and  the  performance  of  VLSI  chips.  In  these  contexts  in¬ 
teraction  is  an  essential  part  of  the  communication  process, 
and  its  role  has  been  extensively  studied  through  the  “com¬ 
munication  complexity”  model  initiated  by  of  A.  C.  Yao  [4] 
(see  [1]  for  a  survey).  Noise  afflicts  interactive  communica¬ 
tions  just  as  it  does  the  one-way  communications  considered 
by  Shannon,  and  for  much  the  same  reasons:  physical  devices 
are  by  nature  noisy,  and  there  is  often  a  significant  cost  as¬ 
sociated  with  making  them  so  reliable  that  the  noise  can  be 
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ignored.  (By  providing  very  strong  transmitters,  cooled  cir¬ 
cuits,  etc.)  To  mitigate  such  costs  we  can  design  our  systems 
to  operate  reliably  even  in  the  presence  of  some  noise.  The 
ability  to  transmit  data  in  the  presence  of  noise,  the  subject 
of  Shannon’s  and  subsequent  work,  is  a  necessary  but  far  from 
sufficient  condition  for  sustained  interaction  and  computation. 

Observe  that  in  the  case  of  an  interactive  protocol,  the 
processors  generally  do  not  know  what  they  want  to  transmit 
more  than  one  bit  ahead,  and  therefore  cannot  use  a  block 
code  as  in  the  one-way  case.  Another  difficulty  that  arises 
in  our  situation  but  not  for  data  transmission,  is  that  once 
an  error  has  occurred,  subsequent  exchanges  on  the  channel 
are  affected.  Such  exchanges  cannot  be  counted  on  to  be  of 
any  use  either  to  the  simulation  of  the  original  protocol,  or  to 
the  detection  of  the  error  condition.  Yet  the  processors  must 
be  able  to  recover,  and  resume  synchronized  execution  of  the 
intended  protocol,  following  any  sequence  of  errors,  although 
these  may  cause  them  to  have  very  different  records  of  the 
history  of  their  interaction.  In  spite  of  these  new  difficulties 
we  have: 

Theorem  2  In  each  direction  between  a  pair  of  processors  let  a 
BSC  of  capacity  C  be  given.  There  is  a  deterministic  commu¬ 
nication  protocol  which,  given  any  noiseless  channel  protocol 
7T  of  length  (duration)  T,  simulates  n  on  the  noisy  channel  in 
time  8(T/C)  and  with  error  probability  e~~a'(TK 

In  alf  but  a  constant  factor  in  the  rate,  this  is  an  exact  ana¬ 
log,  for  the  general  case  of  interactive  communication  prob¬ 
lems,  of  the  Shannon  coding  theorem.  A  similar  statement 
can  be  shown  also  for  the  case  of  “adversarial”  channels. 

As  part  of  our  work  we  introduce  and  show  the  existence  of 
a  new  class  of  codes,  “explicit”  tree  codes.  (These  are  different 
from,  though  in  part  inspired  by,  the  random  tree  codes  of 
the  sequential  decoding  literature.)  Computationally  effective 
(e.g.  polynomial-time)  construction  of  these  codes  is  an  open 
problem.  We  show  that  ii  these  codes  can  be  implemented 
with  polynomial-time  computation,  then  so  can  the  encoding 
and  decoding  procedures  of  the  protocol.  To  be  precise:  Given 
an  oracle  for  a  tree  code,  the  expected  computation  time  of 
each  of  the  processors  implementing  our  protocol,  when  the 
communication  channels  are  BSCs,  is  polynomial  in  T. 

Our  results  are  described  more  fully  in  reference  [2]. 
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Abstract  —  We  show  that  any  distributed  protocol 
which  runs  on  a  noiseless  network  in  time  T,  can  be 
simulated  on  an  identical  noisy  network  with  a  slow¬ 
down  factor  proportional  to  log (d  -f  1),  where  d  is  the 
maximum  degree  in  the  network,  and  with  exponen¬ 
tially  small  probability  of  error. 

I.  Introduction:  A  Coding  Theorem 

Shannon’s  coding  theorem  [2]  can  be  stated  as  follows:  The 
number  of  transmissions  sufficient  to  send  a  T  bit  message 
over  a  noisy  channel  with  reliability  1  —  e  is  asymptotic 

to  where  0  <  C  <  1  is  the  “channel  capacity”,  a  function 
only  of  the  noise  characteristics  of  the  channel.  In  addition, 
Shannon  proves  the  converse  -  that  this  many  transmissions 
are  required. 

Can  we  extend  the  theory  to  networks,  with  a  number  of 
links  available  for  simultaneous  use?  In  his  work,  Schulman[l] 
shows  an  analog  of  the  Shannon  coding  theorem  for  a  pair  of 
processors  running  an  interactive  protocol  in  a  model  intro¬ 
duced  by  Yao[3]  in  his  work  in  communication  complexity. 

The  main  theorem  in  our  work  (stated  most  simply  in  the 
case  of  a  noisy  network  in  which  each  connection  is  made  via 
a  binary  symmetric  channel  of  capacity  at  least  C  >  0)  is  the 
following: 

Theorem  1.1  Any  protocol  II  which  runs  in  time  T  on  a 
noiseless  N -processor  network  of  maximum  degree  d  can  be 
simulated  on  that  network  if  it  is  noisy,  in  time  0(T  c  ). 
The  probability  that  the  simulation  fails  is  at  most  Ne~n(<TK 

The  simulation  is  said  to  fail  if  any  processor  terminates  in 
a  state  other  than  that  which  it  would  have  arrived  at  in  the 
absence  of  noise. 

Model:  Consider  a  network  Af  with  maximum  degree  d .  We 
make  the  following  assumptions.  First,  all  noise  in  our  system 
occurs  only  in  the  communication  links  between  processors. 
Second,  we  look  only  at  the  number  of  communication  bits 
and  ignore  the  computational  cost  of  the  protocol.  Third, 
we  require  that  our  protocol  be  event  driven  and  therefore 
implementable  in  the  asynchronous  setting  but  we  analyze  its 
correctness  and  efficiency  in  the  synchronous  setting.  We  do 
this  so  that  the  notion  of  the  trajectory  of  the  system  and 
thus  the  notion  of  simulation  is  well  defined. 

II.  Method 

On  every  channel  of  our  network  we  will  implement  commu¬ 
nications  using  tree  codes,  introduced  by  Schulman  in  [l]. 
The  noiseless  protocol  will  be  embedded  within  a  simulation 
that  uses  locally  initiated  (hence  asynchronous)  “backups”, 
followed  by  renewed  transmissions,  in  response  to  perceived 
errors  in  the  simulation. 
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III.  Analysis 

We  will  have  in  mind  a  “space-time”  diagram,  where  space 
corresponds  to  the  topology  of  the  network.  By  a  path  in 
space- time  we  mean  a  sequence  of  nodes  {pT}  in  the  network 
such  that  for  each  r,  pT  and  pT+ 1  are  adjacent  1  in  the  network. 
For  a  protocol  II,  denote  by  II  (i)  the  state  (i.e.  the  combined 
state  of  all  the  processors)  after  t  time  steps.  We  show  that 
there  is  a  c  such  that  for  each  noiseless  network  protocol  II 
there  is  a  protocol  E  simulating  II  on  the  same  network  A f ,  so 
that  in  the  presence  of  noise,  E(T)  fails  to  reproduce  II (t)  only 
if  there  is  a  space- time  path  on  which  there  are  at  least  T  —  ct 
corrupted  bit  transmissions.  This  follows  from  an  (slightly  in¬ 
volved)  argument  relating  the  delay  of  the  simulation,  which 
is  only  defined  locally  due  to  the  asynchronous  nature  of  the 
simulation,  to  a  space-time  path  containing  a  corresponding 
number  of  errors.  The  argument  uses  the  combinatorial  prop¬ 
erties  of  both  the  protocol  and  tree  codes.  The  probability 
of  having  that  many  transmission  errors  is  then  bounded  in 
standard  fashion  by  using  the  channel  model. 

From  these  steps  we  obtain  theorem  1.1.  Similar  results 
can  be  derived  for  more  general  channel  models  which  require 
only  a  replacement  of  the  last  segment  of  the  argument. 

IV.  Discussion 

Our  results  are  non-constructive:  though  we  can  show  the 
existence  of  a  simulation  S,  we  are  unable  to  produce  E  ex¬ 
plicitly.  The  impediment  is  exactly  that  explicit  algorithmic 
constructions  for  tree  codes  are  not  known.  Moreover,  the 
problem  of  decoding  tree  codes  is  solvable  given  a  “coding 
oracle.”  Resolving  the  computational  complexity  of  coding 
and  decoding  tree  codes  is  the  most  critical  open  issue  at  the 
conclusion  of  our  work. 

Second,  there  is  a  storage  space  overhead  which  is  separate 
from  that  incurred  due  to  the  cost  of  coding  and  decoding. 
This  comes  from  the  need,  in  our  simulation  E,  for  proces¬ 
sors  to  be  able  to  roll  back  computation;  so  in  our  protocol 
processors  keep  a  record  of  all  their  past  states. 

If  one  is  willing  to  tolerate  error  probabilities  of  the  order 
of  A/poly (T),  then  the  above  problems  can  be  addressed:  the 
storage  overhead  can  be  greatly  reduced,  and  a  sufficiently 
good  tree  code  can  be  constructed. 

We  conjecture  that  the  log(d  +  1)  slowdown  in  our  theorem 
is  necessary. 
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Abstract  —  A  representation  technique  is  presented 
allowing  for  quick  access  of  individual  records  from 
a  static  compressed  dataset.  Given  a  collection  of 
key-record  pairs,  the  representation  allows  the  ap- 
propriate  short  record  to  be  returned  for  any  given 
key.  The  approach  is  a  generalization  of  Perfect  Address 
Hashing .  The  new  approach,  called  Perfect  Value  Hash - 
trig,  uses  a  carefully  chosen  pseudo-random  number 
generator  to  directly  produce  the  correct  record  for 
any  key  in  the  dataset.  This  contrasts  with  Address 
Hashing  where  the  random  number  provides  an  ad¬ 
dress  which  is  then  used  to  recover  the  record  from  a 
separate  table.  Value  Hashing  doesn’t  have  the  theo¬ 
retical  limitations  of  Address  Hashing,  and  in  practice 
is  more  space  efficient  for  records  of  size  less  than  36 
bits.  Value  Hashing  has  the  added  benefit  (important 
when  the  records  are  encoded  for  compression)  that 
variable  length  records  can  be  represented  without  an 
increase  in  the  size  of  the  encoded  records.  This  new 
technique  was  used  to  provide  random  access  from  a 
highly  compressed  spelling  dictionary. 

I.  Background  of  the  Problem 

Given  a  dataset  of  key-record  pairs,  the  general  problem  is  to 
represent  the  dataset  so  that  a  record  can  be  recovered  with¬ 
out  a  slow  search.  A  well  known  solution  is  to  sort  the  keys 
and  store  them  with  each  fixed  length  record.  This,  for  ex¬ 
ample,  is  the  method  used  to  organize  a  conventional  phone 
book.  Lookup  requires  a  search  logarithmic  in  the  number  of 
records.  A  faster  lookup  can  be  performed  by  storing  with 
the  dictionary  a  pseudo-random  number  generator  called  an 
Address  Hash.  This  function  takes  any  key  and  returns  a 
number  which  is  used  to  tell  where  the  associated  record  is 
stored.  The  equivalent  example  with  a  phone  book  is  where 
an  Address  Hash  would  convert  a  name  (the  key),  into  a  page 
(the  address)  where  the  phone  number  (the  record)  is  stored. 
This  allows  for  faster  access  because  the  search  is  now  limited 
to  one  page  of  the  phone  book,  and  so  the  speed  of  access 
is  independent  of  the  total  size  of  the  book.  It  is  possible 
to  represent  this  information  much  more  space  efficiently  by 
not  storing  the  key  at  all.  A  Perfect  Address  Hash  is  a  spe¬ 
cially  created  Address  Hash  for  a  particular  dataset  which 
pioduces  a  different  address  for  every  record  (i.e.  every  page 
in  the  example  phone  book  has  exactly  one  phone  number).  It 
provides  for  time  efficient  access  because  no  search  is  needed 
among  the  records  at  a  given  address.  An  important  result  of 
Perfect  Address  Hashing  by  Melhorn  [1]  is  that  an  overhead 
of  approximately  (1/n)  \n(nn/nl)/  ln(2)  «  1.44bits  is  required 
to  map  n  keys  to  n  unique  addresses  (additional  overhead  is 
required  if  the  records  do  not  have  a  fixed  length).  Practi¬ 
cal  algorithms  for  finding  Perfect  Address  Hash  functions  for 
large  number  of  records  (106)  have  been  reported  with  a  cost 
of  3.6  bits  per  record  [2].  For  small  variable  length  compressed 
lecoids,  the  size  of  the  Perfect  Address  Hash  function  may  be 
unacceptably  large  compared  to  the  size  of  the  compressed 
records. 


II.  Solution 

Value  Hashing  is  the  method  of  using  a  pseudo-random  num¬ 
ber  generator  to  calculate  information  about  the  record  itself. 
For  the  phone  book  example  a  pseudo-random  number  gener¬ 
ator  would  be  created  so  that  the  number  it  returns  for  a  given 
name  is  that  person  s  phone  number  (or  the  bits  of  a  prefix 
encoded  representation  of  the  phone  number).  This  approach 
overcomes  Melhorn’s  theoretical  bound  on  overhead  because 
each  key  does  not  map  to  a  unique  address.  The  achievability 
of  the  Slepian-Wolf  [3]  bound  for  broadcast  channels  [4]  shows 
that  the  size  of  the  Value  Hash  function  (at  least  in  theory)  can 
be  made  independent  of  n  (and  so  the  overhead  goes  to  zero 
for  large  n).  This  is  obvious  if  you  consider  that  a  random  se¬ 
quence  of  bits  will  duplicate  the  records  of  a  database  of  size  n 
with  probability  (l/2)n.  In  principle  you  could  create  Perfect 
Value  Hashes  by  evaluating  approximately  2n  hash  parame¬ 
ters  to  see  if  they  happen  to  regenerate  the  desired  records  for 
the  keys  in  the  dataset,  and  then  encoding  the  index  of  the 
first  successful  mapping  of  keys  to  records.  Using  the  entropy 
of  the  waiting  time  for  first  success  (assuming  each  hash  func¬ 
tion  is  an  independent  trial  producing  all  bit  sequences  of  a 
given  length  equiprobably)  it  is  easily  shown  that  this  index 
encoding  requires  approximately  n  +  l/ln(2)  bits  on  average. 
By  breaking  down  the  search  for  hash  functions  into  groups 
of  k  bits  it  is  possible  to  do  small  combinatoric  searches  on 
subsets  of  k  bits  from  n,  so  that  the  total  overhead  is  approx¬ 
imately  (n/k)(k- f-  l/ln(2)).  (Ramakrishna  noted  that  brute 
force  is  effective  in  finding  Perfect  Address  Hash  functions 
and  proposed  a  composition  scheme  for  minimizing  worse  case 
evaluation  time  [5-6].)  With  current  computer  speeds  k  =  16 
is  easily  achievable  which  implies  .09  bits  per  binary  record. 
A  practical  algorithm  has  been  developed  for  finding  Perfect 
Value  Hash  functions  with  an  overhead  of  .1  bits  per  binarv 
record.  The  average  time  required  to  evaluate  the  hash  func¬ 
tion  is  independent  of  n.  The  same  technique  can  be  used  for 
non-binary  records  for  increased  speed  in  evaluation.  This  has 
been  used  to  provide  random  access  by  key  of  any  4-bits  from 
a  highly  compressed  spelling  dictionary. 
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Abstract  —  This  study  further  investigates  and  gen¬ 
eralizes  the  database  model  introduced  in  [1]  by  ap¬ 
plying  new  techniques  to  the  problem  of  data  re¬ 
trieval.  The  problems  analyzed  are  representative 
of  important  issues  involved  in  storing  data  for  con¬ 
text  dependent  retrieval  from  databases.  They  arise 
when  simple  storage  devices  such  as  tapes  and  disks 
are  used  to  store  relatively  more  complex  data  struc¬ 
tures  such  as  large  multi-dimensional  images.  The 
mismatch  between  the  physical  nature  of  the  stor¬ 
age  device  and  the  data  structure,  i.e.,  the  manner  in 
which  its  elements  are  requested,  prevents  some  re¬ 
quests  from  being  instantaneously  accessible  on  the 
database.  Hence,  we  have  the  non-trivial  problem  of 
designing  the  database  so  as  to  minimize  the  expected 
access  time  EA. 

I.  Introduction 

The  basic  model  in  [1]  is  generalized  to  a  large  multi¬ 
dimensional  image  stored  onto  a  lower  dimensional  tape  where 
the  sequence  of  user  requests  is  modelled  as  a  random  walk 
on  the  image.  It  is  found  that  careful  use  of  redundancy  in 
the  storage  scheme  can  reduce  access  time  significantly  over 
the  no-redundancy  case.  As  an  interesting  information  theory 
problem,  we  examine  what  is  the  minimum  expected  access 
time  that  can  be  achieved  under  any  system  using  redundancy, 
a  cache,  and  multiple  tapes  (possibly  implementing  erasure- 
correcting  codes). 

II.  Expected  Access  Time,  EA 
Under  a  linear  cost  function  and  no  redundancy,  we  find 
that  the  minimum  access  time,  EA* ,  is  dependent  on  the  func¬ 
tion  *l>  which  we  define  as  the  absolute  central  moment  of 
a  graph.  For  a  graph  B,  ^B)  =  £ beBd^c\  where  c  is 
the  center  of  B  and  d(6,c)  is  the  graph  distance  between  the 
points  b  and  c.  It  is  found  that  when  storing  a  d-dimensional 
toroidal  image  I  onto  a  t-dimensional  toroidal  tape  T  of  equal 
volume  under  a  linear  cost  function  without  redundancy,  the 
minimum  access  time  EA*  is  bounded  by 


cost  function,  EA*  =  \{l+  1).  It  is  then  demonstrated  that 
redundancy  can  be  used  to  improve  EA*  significantly  under 
a  capped  cost  function,  EA*  <  V2i.  On  the  other  hand,  we 
find  simple  counter  examples  where  redundancy  only  makes 
performance  worse.  In  fact  it  is  conjectured  that  redundancy 
can  not  improve  performance  under  the  linear  cost  function 
for  this  model.  This  is  found  to  be  an  interesting  question  in 
its  own  right  and  can  be  modelled  in  a  game  theory  context  . 

III.  Caching  and  Multiple  Tapes 

When  storing  a  one-dimensional  image  without  redun¬ 
dancy,  a  cache  of  size  C  can  be  shown  to  reduce  access  time 
by  a  factor  of  at  least  C  +  1.  On  the  other  hand  EA*  cannot 

be  reduced  by  more  than  \  +  1  where  n  is  the  size  of 

the  image.  We  then  examine  how  a  cache  and  redundancy  can 
be  applied  together  to  further  improve  performance.  In  cases 
where  exact  reconstructions  are  not  required  to  satisfy  user 
requests,  we  model  the  problem  in  a  rate-distortion  context 
and  explore  achievable  distortion-access  time  pairs  and  plan 
to  relate  this  work  to  that  described  in  [3]. 

The  cache  problem  is  extended  to  utilizing  multiple 
tapes/heads  to  improve  performance.  Using  T  tapes  under 
a  block/file  segmentation  scheme,  the  access  time  is  reduced 
by  a  factor  of  T.  [4]  [2]  demonstrate  problems  where  Reed- 
Solomon  codes  can  be  used  to  achieve  a  significant  improve¬ 
ment  over  file  segementation.  Using  such  erasure-correcting 
codes  for  the  multiple  tape  problem,  it  is  found  that  accessed 
data  elements  that  would  incur  high  retrieval  costs  can  be 
treated  as  erasures.  Then  using  data  from  the  other  tape 
heads  the  erasure  can  be  reconstructed  using  the  code. 

IV.  Conclusions 

The  preliminary  results  from  analyzing  the  representative 
models  discussed  suggest  that  the  most  effective  means  for  im¬ 
proving  EA*  is  to  use  redundancy  in  the  storage  device.  It  is 
demonstrated  however  that  this  has  to  be  done  very  carefully 
otherwise  performance  degrades.  Deciding  whether  to  use  a 
cache  or  multiple  tape  heads  is  less  significant  and  the  relative 
importance  of  the  two  depends  on  the  model  used. 


EA*  > 


W) 


where  each  pixel/block  of  the  image  is  represented  as  a  node 
in  the  graph.  In  particular,  the  above  bound  is  found  to  be 
tight  when  storing  rectangular  images  onto  1-dimensional  tape 
loops,  i.e.,  EA*  =  +  C>(nd“2)  where  n  is  the  length  of 

the  sides  of  a  cube  image. 

We  also  introduce  a  slight  variation  to  the  problem  where 
the  tape  is  a  loop  and  the  read  head  is  restricted  to  moving 
in  only  one  direction.  We  begin  with  expressions  for  the  mini¬ 
mum  achievable  access  time  for  storing  images  onto  the  unidi¬ 
rectional  tape  under  a  linear  cost,  EA*  =  |vol(I)  and  capped 
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We  report  on  two  types  of  results.  The  first  is  a  study  of 
the  rate  of  decay  of  information  carried  by  a  signal  which  is 
being  propagated  over  a  noisy  channel.  The  second  is  a  series 
of  lower  bounds  on  the  depth,  size,  and  component  reliability 
of  noisy  logic  circuits  which  are  required  to  compute  some 
function  reliably.  The  arguments  used  for  the  circuit  results 
are  information-theoretic,  and  in  particular,  the  signal  decay 
result  is  essential  to  the  depth  lower  bound. 

Our  first  result  can  be  viewed  as  a  quantified  version  of 
the  data  processing  lemma,  for  the  case  of  Boolean  random 
variables. 

Theorem  1  (Signal  Decay)  If  X ,  Y  are  Boolean  random 

variables  and  Z  is  the  output  of  the  channel  [  6  1  —  b\ 

I(X'  Z) 

on  input  Y  then  ’  <  sin2  0 ,  where  0  is  the  angle  in  the 

1  (A ,  i  ) 

plane  between  the  vectors  (y/1  -  a,  yfa)  and  ( y/b ,  yjl  -  6). 

It  is  worth  emphasizing  that  the  bound  holds  regardless  of 
the  distribution  on  X  and  Y,  and  is  a  property  of  the  chan¬ 
nel  alone.  The  bound  is  tight  in  that  for  any  such  channel, 
one  can  describe  a  joint  distribution  for  X  and  Y  so  that 
I(X ;  Z)/I(X ;  Y)  is  arbitrarily  close  to  sin2  6 . 

The  previous  theorem  is  a  general  result  about  mutual  in¬ 
formation.  The  remaining  theorems  concern  the  noisy  circuit 
model  of  Von  Neumann  [7].  The  signal  decay  theorem  is  use¬ 
ful  in  proving  lower  bounds  on  the  structure  of  such  circuits 
whose  components  (i.e.  individual  logic  gates)  fail  with  some 
probability.  These  results  improve  and  simplify  all  previous 
lower  bounds  in  this  model. 

Theorem  2  (Noisy  Circuit  Depth)  Let  f  be  a  Boolean 
function  which  depends  on  n  inputs .  Let  C  be  a  circuit  of 
depth  c  using  gates  with  at  most  k  inputs,  where  each  gate 
fails  independently  with  probability  (l-f)/2.  Suppose  C  com¬ 
putes  the  function  f  correctly  on  all  inputs  with  probability  at 
least  1-6  where  6  <  1/2.  Let  A  =  1+6 log  6+(l -6)  log(l-6). 

.  Ife>l/kthenc>£$$ 

•  U  <  1/fc  then  n  <  1/A 

To  prove  this  theorem,  we  analyze  the  mutual  information 
between  the  input  to  the  noisy  circuit  and  its  output.  This 
information  must  be  large  since  the  circuit  reliably  computes 
the  function  /;  yet,  according  to  the  signal  decay  theorem, 
each  noisy  gate  in  the  circuit,  when  viewed  as  a  noisy  channel, 
decreases  information.  Together,  these  observations  imply  the 
lower  bound  on  circuit  depth.  This  improves  on  the  lower 
bounds  of  Pippenger  [5]  and  Feder  [1]. 

A  similar  technique,  using  a  different  measure  of  correlation 
than  mutual  information,  provides  a  lower  bound  on  noisy 
circuit  size. 
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Theorem  3  (Noisy  Circuit  Size)  Let  f  be  a  Boolean  func¬ 
tion  with  sensitivity 1  s.  Let  C  be  a  circuit  using  gates  with  at 
most  k  inputs,  where  each  gate  fails  independently  with  proba¬ 
bility  (1  —  £)/2.  Suppose  C  computes  the  function  f  correctly 
on  all  inputs  with  probability  at  least  1—6  where  6  <  1/2,  then 
the  number  of  gates  in  C  is  at  least  3  log  *+2*  |°|W~25))  where 


Previously,  Gal  [3],  Reischuk  and  Schmeltz  [6],  and  Gacs  and 
Gal  [2]  proved  an  £l(slog  s )  bound  on  reliable  circuit  size.  Our 
improvement  is  in  the  bound’s  dependence  on  component  re¬ 
liability. 

Finally,  we  establish  a  threshold  for  component  reliability 
below  which  one  cannot  reliably  compute  all  functions. 

Theorem  4  (Component  Reliability)  For  k  odd  there  ex¬ 
ists  6  <  1/2  such  that  for  all  Boolean  functions  f  there  exists 
a  formula2  (using  gates  with  at  most  k  inputs,  where  each  gate 
fails  independently  with  probability  e )  which  computes  f  cor¬ 
rectly  on  all  inputs  with  probability  at  least  1  -  6  if  and  only 


This  extends  work  done  by  Hajek  and  Weller  [4],  who 

showed  the  result  for  k  =  3. 
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xThe  sensitivity  of  a  function  is  the  maximum  (over  all  inputs)  of 
the  number  of  bits  in  the  input  which,  when  changed  individually, 
change  the  function  value. 

2 A  formula  is  a  circuit  in  which  each  gate  has  out-degree  one. 
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Abstract  —  It  is  shown  that  for  a  suitable  choice  of 
the  parameters,  multiple  repetition  feedback  coding 
achieves  a  rate  close  to  capacity  for  an  arbitrary  dis¬ 
crete  memoryless  channel.  For  wide-  sense  symmetric 
channels  the  difference  between  the  rate  of  a  multiple 
repetition  feedback  strategy  and  the  channel  capacity 
can  be  written  as  an  informational  divergence. 


I.  Multiple  repetition  feedback  coding 


Consider  a  discrete  memoryless  channel  with  input  symbols 
0, . . . ,  m  —  1  and  output  symbols  0, . . . ,  rh  —  1.  We  assume 
w.l.o.g.  [2]  m  >  m.  The  output  symbols  j,  m  <  j  <  rh  can  be 
seen  as  erasure  symbols.  The  channel  error  probabilities  are 
denoted  by  pi3  (0  <  i  <  m,  0  <  j  <  m).  The  idea  of  repetition 
coding  is  the  following:  suppose  during  transmission  of  a  mes¬ 
sage  an  i  — *  j  error  occurs.  The  sender  can  detect  this  because 
of  the  feedback  link  and  ’corrects’  the  error  by  repeating  the 
symbol  i  a  fixed  number  kij  of  times.  The  receiver  scans  the 
received  sequence  from  right  to  left  and  replaces  each  subse¬ 
quence  by  i.  A  consequence  of  repetition  coding  is  that 

messages  have  to  be  precoded,  since  no  subsequence  jikij  may 
occur.  Foi  asymmetric  channels  precoding  is  also  used  to  fix  a 
symbol  precoding  distribution  q  =  (go,  •  •  ■  ,9m- 1)-  This  leads 
to  a  precoding  rate  Rp(q).  The  expected  number  of  trans¬ 
missions  to  send  symbol  i  such  that  all  occuring  transmission 
errors  are  corrected  is  Ci  =  1/(1  —  kij  pi  j)-  The  iate  of 

a  repetition  feedback  strategy  with  repetition  parameters  kij 
(0  <  i  <  ?n,  0  <  j  <  rh)  as  a  function  of  the  symbol  precoding 
distribution  q  is 


R(q)  = 


Rp(q) 


E,  w 

The  rate  R  is  the  maximum  of  R(q)  over  all  symbol  precoding 
distributions  q. 


II.  WlDE-SENSE  SYMMETRIC  CHANNELS 
A  discrete  memoryless  channel  is  called  wide-sense  symmet¬ 
ric  if  the  channel  considered  as  a  graph  with  labeled  edges 
satisfies: 

1.  All  input  nodes  have  the  same  bag  of  outgoing  edge 
labels. 

2.  All  output  nodes  j,  0  <  j  <  m  have  the  same  bag  of 
incoming  edge  labels. 

3.  All  output  nodes  j,  m  <  j  <  rh  have  the  same  bag  of 
incoming  edge  labels. 

Note  that  a  bag  is  a  set  where  elements  can  occur  more  than 
once.  The  labels  of  the  edges  that  come  in  at  output  nodes  j , 
0  <  j  <  m.  are  (in  arbitrary  order)  denoted  by  pi  (0  <  i  <  m). 
The  labels  of  the  edges  that  come  in  at  output  nodes  j,  m  < 
j  <  m,  are  (in  arbitrary  order)  denoted  by  pi  (m  <  i  <  rh).  A 
wide-sense  symmetric,  channel  has  the  property  that  capacity 
is  achieved  for  a  uniform  input  distribution. 

Suppose  a  multiple  repetition  feedback  strategy  is  used  for 
such  a  channel.  Each  label  pi  (0  <  i  <  rh)  will  correspond  to 
a  repetition  parameter  kr.  Note  that  ki  =  1  for  m  <  i  <  rh. 


For  reasons  of  symmetry  the  symbol  precoding  distribution 
is  no  langer  fixed  during  precoding,  i.e.  all  messages  without 
forbidden  subsequences  are  allowed.  The  rate  of  the  repetition 
strategy  satisfies  R  =  (l-Eo<j<m  loSm  where  x>m 
is  the  solution  of  Eo <i<mx~ki  mx~1-  From  l1]  follows 
that  this  rate  is  equal  to  the  capacity  of  the  channel  when  the 
channel  error  probabilities  satisfy  p;  =  Eo<jCmW 

for  0  <  i  <  m,  and  p,  =  -bEm<i<AW  for  ™  <'< 
m.  Denote  the  solution  of  these  equations  by  pi  (0  <  i  < 
m).  The  following  theorem  shows  how  close  the  repetition 
strategy  approaches  channel  capacity  for  an  arbitrary  wide- 
sense  symmetric  channel. 

Theorem  1  Consider  an  arbitrary  juide-sense  symmetric 
channel  with  characteristic  channel  error  probabilities  pi  (0  < 
i  <  rh)  and  capacity  C.  Let  R  be  the  rate  of  the  multi¬ 
ple  repetition  feedback  strategy  with  repetition  parameters  ki 
(0  <  i  <  rh).  Then 

C  —  R  =  DTn((p0,  •  •  •  ,  p-rh  —  1 )  II  {jpo ,  •  •  ■  ,Pm  —  l)) 

Here  Dm  denotes  the  m-ary  informational  divergence. 

III.  Arbitrary  channels 

For  arbitrary  channels  it  is  difficult  to  obtain  a  simple  ex¬ 
pression  indicating  the  exact  distance  between  the  rate  of  a 
multiple  repetition  feedback  strategy  and  the  channel  capac¬ 
ity.  However,  it  is  possible  to  show  that  for  a  suitable  choice 
of  the  repetition  parameters,  the  rate  will  be  close  to  capacity. 
When  analysing  the  wide-sense  symmetric  case,  it  follows  that 
the  optimal  channel  error  probabilities  pi  (0  <  i  <  m)  satisfy 
ki  +log m(Pi/Ho<j<mPj)  ~  0  for  lar§e  rePetiti°n  parame¬ 
ters.  Therefore,  for  arbitrary  discrete  memoryless  channels 
with  channel  error  probabilities  pij  (0  <  i  <  m,0  <  j  <  m), 
we  suggest  to  use  repetition  parameters  kij  such  that  kij  « 

-log  miPij  /  ^2o<s<m  ^  f°r  0  ^  3  <  m'  N°te  that  k*3  should 

be  equal  to  1  for  m  <  j  <  rh. 

The  rate  of  this  suitably  chosen  multiple  repetition  feed¬ 
back  strategy  as  a  function  of  the  channel  is  denoted  by 
R((pij)ij).  The  following  theorem  indicates  how  close  this 
rate  approaches  the  channel  capacity. 

Theorem  2  Consider  an  arbitrary  discrete  memoryless 
channel  with  channel  error  probabilities  pij  (0  <  i  <  m,  0  < 
j  <  m)  and  capacity  C .  If  pzj  — ►  0  for  0  <  i  /  j  <  m,  then 

C-R=0{  Y,  Pij) 

0 <i^j  <m 

From  Theorem  1  follows  that  the  order  of  approximation  in 
Theorem  2  is  tight. 
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SUMMARY 

The  unnormalized  finite  autocorrelation  function  C(r) 
of  the  sequence  A  =  {afc}jLi  of  complex  numbers  on 
the  unit  circle  is  defined  by  C(r )  =  J^k=i  a*ajfc+r> 

C(0)  =  n,C(— r)  —  C*(t ),  and  | C(n  —  1)|  =  1,  where  z* 
denotes  the  complex  conjugate  of  z.  We  seek  the  sequence 
of  length  n,  for  each  n  >  3,  which  minimizes  the  value  of 

max  \C(t)\) 

l<r<n-2  V  n 

and  the  value 

Tn  =  min  max  |C(t)| 

all  sequences  1  <  r  <  n— 2 

of  this  minimizing  sequence. 

As  shown  in  [1],  is  the  same  sequence 

for  A,  for  A*  =  {a£}£=1,  for  A!  =  {an+1_Jfc}£=1,  and 
for  each  Aap  =  {a/?fcafc}£=1  where  a  and  (3  are  complex 
numbers  with  |a|  =  \/3\  =  1. 

A  sequence  A  with  maxx  <  r  <  n-2  |C<(r)l  <  1  is  called  a 
generalized  Barker  sequence  [1],  A  clever  hill-climbing 
program  is  described  in  [2],  which  reported  empirical 
values  of  Tn  for  3  <  n  <  25,  with  the  sequences  at¬ 
taining  these  values.  In  particular,  generalized  Barker 
sequences  are  claimed  for  all  n  <  25  except  for  n  = 
20.  (A  few  of  these  examples  are  erroneous,  although 
some  or  all  of  these  errors  may  be  typesetting  mistakes.) 
No  effort  was  made  in  [2]  to  describe  the  extremal  se¬ 
quences  algebraically  nor  to  make  use  of  the  group  G  of 
correlation-magnitude-preserving  transformations  to  sim¬ 


plify  the  presentation  of  the  data  (as  done  in  [1]).  Here 
are  best  sequences  An  =  {njb}£=i>  and  the  corresponding 
correlation  sequences  {|Cf(r)|}”=1,  for  n  =  3,4,  and  6, 
expressed  algebraically. 


n 

Gi 

a2 

«3 

CZ4 

a5 

a6 

3 

l 

i 

-l 

4 

l 

i 

-e<7 

— e3'7 

5 

6 

l 

i 

e*'/3 

-1 

1 

e47r«/3 

(Here,  j  =  cos"1  (1/4)  =  75°. 52248781  •  ■  * .) 


n 

|C*(1)| 

|C(2)| 

|C(3)| 

|C(4)| 

|C(5)| 

Tn 

3 

0 

(1) 

0 

4 

K 

1/2 

1/2 

(1) 

1/2 

u 

6 

1 

1 

1 

1 

(1) 

1 

The  uniqueness  (modulo  the  group  G)  of  the  sequence 
of  length  6  was  shown  in  [3].  It  is  believed  that  the  meth¬ 
ods  used  to  obtain  these  results  can  be  extended  to  other 
values  of  n . 
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Polyphase  sequences  over  N-th  complex  roots  of  unity  are 
considered.  A  sequence  is  perfect  if  all  its  out-of-phase  periodic 
autocorrelation  equal  zero.  Over  the  past  30  years,  numer¬ 
ous  constructions  of  perfect  polyphase  sequences  (PPS)  have 
been  proposed  due  to  their  importance  in  various  applications 
such  as  pulse  compression  radars,  fast-startup  equalization 
and  channel  estimation,  and  spread  spectrum  multiple  access 
systems.  We  show  that  all  previous  PPS  constructions,  known 
to  us,  can  be  classified  into  four  classes  (c.f.  [6]):  (i)  Gener¬ 
alized  Frank  sequences  due  to  Kumar,  Scholtz  and  Welch  [4, 
Thm.3],  (ii)  Generalized  chirp-like  polyphase  sequences  due 
to  Popovic  [7],  (iii)  Milewski  sequences  [5],  (iv)  PPS  associ¬ 
ated  with  the  general  construction  of  generalized  bent  function 
due  to  Chung  and  Kumar  [1].  The  key  result  here  is  a  unified 
construction  of  PPS  which  includes  the  above  four  classes  as 
special  cases.  Note,  however,  that  only  explicit  constructions 
of  PPS  are  considered  in  this  work,  since  PPS  obtainable  by 
applying  appropriate  transformations  to  one  or  more  previ¬ 
ously  explicitly  constructed  PPS  are  always  obtainable  from 
the  unified  construction  in  the  same  manner.  Many  useful 
transformations  of  this  kind  can  be  found  in  [1,  Thm.l],[2],ld, 


bound  on  the  number  of  PPS  for  a  given  L  and  N  A  com¬ 
puter  program,  which  finds  all  PPS  derivable  from  (1)  proves 
that  Theorem  1  in  fact  generates  all  possible  PPS  in  the  above 

search  range  [6].  , 

We  conjecture  a  simple  relationship  between  the  length  an 

the  minimum  alphabet  size. 

Conjecture  1  Let  L  =  sm2,  for  s,meZ+  and  s  is  square- 
free.  A  perfect  polyphase  sequence  of  length  L  exists  if  and  only 
if  its  alphabet  size  N  is  an  integer  multiple  of  Nmin  where  Nmin 
is  the  minimum  alphabet  size  given  by 

2 sm  for  even  s  and  odd  m, 
sm  else. 

This  conjecture  is  closely  related  to  some  famous  open 
problems  such  as  the  nonexistence  of  Barker  sequences,  cir- 
culant  Hadamard  matrices,  and  one-dimensional  generalized 

bent  functions  [6].  .,  , 

With  the  unified  construction  (1),  it  is  not  difficult  to  bui 
optimal  sequence  sets,  with  respect  to  the  Sarwate  bound 
[8],  that  generalizes  all  previously  known  constructions  of  this 


inm.z],i*j.  —  .  uL_i 

For  a  polyphase  sequence  {exp{2'Ky/-lh{k)/ L)\k=Q ,  we 
call  {h(k)}klQ  its  index  sequence,  whose  components  need 
not  be  an  integer. 

Theorem  X  Let  L  =  sm2 ,  for  s,m  €  Z+.  The  polyphase 
sequence  of  length  L  defined  by  its  index  sequence 

m2(s  +  1)  (  !(/  +  1)\  ,2 

f(km  +  l)  =  2 -  y°  +  no  2  )  k 

+m(ri7r(f)  +  n\)k  +  f{l) 

Vi  G  Zm,Vfc  G  Z sm  (1) 

where  r0  is  any  integer  in  Zs  coprime  to  s,  n0  is  any  integer  m 
Zs  such  that  (s  +  l)n0  is  even  and  r0  +  n0l{l  + 1)/2  is  coprime 
to  s  for  all  l  €  Zm,  n  is  any  integer  in  Zsm  coprime  to  m, 
n,  is  any  integer  m  Zsm,  n  is  an  arbitrary  permutation  of  the 
elements  of  Zm,  and  /(Z),Vi  G  Zm>  **  an  arbitrary  rational- 
valued  function,  is  perfect. 

The  number  of  distinct  PPS  for  a  large  subset  of  (1)  is 
determined  below. 


Theorem  2  For  the  construction  (1),  the  number  of  perfect 
polyphase  sequences  of  length  L  =  sm2  and  alphabet  size  N 
is  m\Nm  for  s  =  1;  and  sm(m\)<l>{s)Nm  for  s  >  1,  n0  -  0 
and  n  =  1,  where  the  Euler’s  function  <j>(u)  is  the  number  of 
integers  in  {1,  2,  •  •  • ,  u  -  1}  coprime  to  u. 


Comparing  with  exhaustive  search  results  for  all  PPS  sat¬ 
isfying  N  <15,  L  <  20  and  NL  <  ll11  (c.f.  [6]),  Theo¬ 
rem  2  predicts  the  exact  numbers  of  all  PPS  found  except  for 
(L,N)  =  (12,6).  Hence,  Theorem  2  gives  an  excellent  lower 

iThis  work  was  supported  by  the  Croucher  Foundation  Fellow- 


kind. 

Theorem  3  Denote  the  smallest  prime  divisor  of  L  by  p. 
Then  the  set  of  p  -  1  perfect  polyphase  sequences  of  length 
L  as  defined  in  Theorem  1  with  n0  =  0,  tt  the  identity  map 
and  ro  =  T\  an  element  of  {1, 2,  •••  ,p- 1},  m  an  arbitrary  in¬ 
teger,  and  G  Zm  an  arbitrary  rational-valued  function, 

is  optimal  with  respect  to  the  Sarwate  bound. 


[1] 

[2] 
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Abstract  —  Golomb  sequences  of  length  L  form  a  class 
of  polyphase  sequences  which  have  a  perfect  periodic 
autocorrelation.  Given  certain  constraints,  they  also 
have  a  favorable  aperiodic  autocorrelation.  This  pa¬ 
per  presents  a  comprehensive  study  of  the  asymptotic 
behavior  of  the  aperiodic  autocorrelation  function  of 
Golomb  sequences. 


Theorem  1 

f  °'48  VtL,  Jm(Z)  =  ^il£0, 
r  >  2,  o  <  £  <  0.37, 

Br  (b)  «  ~ 

7Tsin  (tO  >  Im(L)  =  ^l, 
l  r  >  2,  0.5  >  i  >  0.37. 


I.  Introduction 

In  1953,  R.  H.  Barker  [1]  introduced  binary  sequences 
with  particularly  favorable  aperiodic  autocorrelation  functions 
(ACFs).  In  1965,  Golomb  and  Scholtz  [2]  proposed  a  class  of 
generalized  polyphase  Barker  sequences  which  satisfy  the  orig¬ 
inal  Barker  constraint  on  aperiodic  autocorrelation.  In  order 
to  obtain  a  larger  number  of  sequences  with  favourable  aperi¬ 
odic  correlation,  Golomb  [3]  defined  a  class  of  infinite  classes 
of  sequences.  For  Golomb  sequences  of  arbitrary  length  L, 

■  7rr(fe  — l)fc 

dr,k  —  e  l  j  1  <  k  <  L,  (r,  L)  =  1,  (1) 

Zhang  and  Golomb  [3]  proved  that  the  maximum  out-of-phase 
aperiodic  autocorrelation  value  with  r  =  1  (and  r  —  L  ~  1)  is 
bounded  by  A/i/4.438.  When  L  is  odd,  r  =  Fan,  Darnell 
and  Honary  [4]  further  showed  that  the  out-of  phase  aperiodic 
autocorrelation  value  of  Golomb  sequences  is  asymptotically 
bounded  by  ^/i/2.174.  In  this  paper,  we  study  the  general 
asymptotic  behavior  of  Golomb  sequences. 

II*  Basic  Properties  of  Golomb  Sequences 

The  maximum  out-of-phase  autocorrelation  value  of  the  se¬ 
quence  ar,k  is  given  by 


Br 


max 


\Cr(T)\  =  |Cr(2m(L))|, 


(2) 


where  Cr(r)  =  «r,y<,-+ri  Jm(Z)  is  the  value  of  r(0  < 

r  <  L)  which  maximizes  |C(r)|. 

For  Golomb  sequences,  it  is  simple  to  show  that 
Lemma  1 


Cr  (r)  =  - 


sin  ~  T 2 


(3) 


Ct  (r)  =  (-l)(£~1)r+1  (i  -  r)  ,  Cr  (r)  =  Ct_r  (r) .  (4) 

Thus  we  need  only  consider  the  values  of  Cr  (r)  in  the  range 
°fl<r<  [f]  and  l<r<  [ijl]  .  S 


where  bL  =  ±1  mod  r,  1  <  b  <  f|],  So  =  /Eg  and 
zo  =  1.1655.  V  nb 

When  r  =  1,  which  is  excluded  from  above  derivation,  we 
have  the  following  result  which  is  the  same  as  in  [3]  but  the 
derivation  is  simpler. 

Theorem  2 

-81  =  \A/4.34,  Im(L)  =  y/L/2M  8.  (6) 

The  result  given  in  [4]  can  be  obtained  directly  from  Eqn  5. 
Corollary  1  If  L  is  odd  and  r  =  then 

=  \A/2.17,  Im(L)  =  vT7l34.  (7) 


IV.  Summary 

In  conclusion,  we  have  considered  the  asymptotic  maximum 
out-of-phase  ACF  of  Golomb  sequences  of  arbitrary  length  L 
and  order  r.  It  is  shown  that  the  Br  is  bounded  by  yJTj 4.34  if 

r  =  1;  or  0.48 s/bpL,  if  r  >  2,  b/r  <  0.37  ;  or  X/x  sin  nb/r, 
if  r  >  2,  b/r  >  0.37. 
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III.  Asymptotic  Behavior  of  the  Aperiodic  ACF 
of  Golomb  Sequences 

Based  on  Lemma  1,  we  have  the  following  asymptotic  bound: 
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Abstract  —  New  classes  of  multi-level  and  complex 
sequences  with  perfect  periodic  autocorrelations  are 
presented.  The  sequences  are  derived  directly  from 
certain  m-sequences  over  rational  and  Gaussian  inte¬ 
gers. 


I.  Quasi-perfect  Multi-level  Sequences 

In  their  basic  form,  p-level  m-sequences  comprise  the  rational 
integers  0,1,2,*  —  ,  (p-1),  where  p  is  a  prime.  To  derive  a  prac¬ 
tical  bipolar  sequence  from  such  an  m-sequence,  integer  and 
sinusoidal  level  transformations  can  be  used  [l].  Both  these 
transformations  yield  bipolar  signals  with  useful  periodic  ACF 
properties.  For  p  =  3  and  5,  the  integer  level  transformation 
gives  bipolar  IR  sequences  A  =  {a,}  of  length  L  =  2N  with 
quasi-perfect  periodic  ACFs  of  the  form: 

f  P,  1  =  0 

M0  =  S  l  =  N>.  (1) 

[  0,  otherwise. 


II.  Quasi-perfect  Complex  Sequences 

Let  h(x)  =  xn  +  hn-ixn~1+'-  -  +  kix  +  ho,  hj  G  G*,  denote  a 
primitive  polynomial  of  degree  n  over  residue  class  of  Gaussian 
integer,  Gn-  A  maximal  length  sequence  A  -  {a3}  over  G7 r 
can  be  obtained.  It  is  shown  that  most  of  the  properties  of  the 
complex  m-sequences  are  similar  to  those  of  maximal  length 
sequence  over  Galois  fields;  however,  there  are  some  particular 
properties  which  are  distinct  [2].  Specifically,  two  sub-classes 
of  complex  m-sequences  of  length  L  =  4 N  with  the  following 
quasi-perfect  autocorrelation  function  have  been  obtained  by 
letting  7T  =  2  +  i  and  7r  =  3 i,  which  correspond  to  p  =  5  and 
p  —  3  respectively. 


L— 1 


0a{1)  =  ^  <**<**+!  “  < 


r  p, 

i  p, 

-p, 
-iP, 
l  o, 


1  =  0; 
l  =  N; 
l  =  2N; 
l  =  3iV; 
otherwise. 


(2) 


III.  Synthesis  of  Perfect  Multilevel  and 
Complex  Sequences 

If  two  component  multi-level  sequences  A  =  {a;}  of  period 
L  =  2N  and  B  =  {(-l)5  }  of  period  2  are  combined  using 
digit-by-digit  multiplication,  the  periodic  ACF  of  the  resulting 
composite  sequence  C,  9c{l),  is  given  by 


MO  =  (-i)'MO  = 


0A(i),  l  =  0  mod  2 

-MO,  1  = 1  mod  2 


(3) 


If  sequence  A  is  chosen  as  a  transformed  p-level  m-sequence 
with  quasi-perfect  ACF,  and  the  length  of  this  sequence  A  is 
exactly  divisible  by  2  to  give  an  odd  integer  N,  then  due  to 


the  inverse-repeat  (IR)  format  of  A,  the  digit-by-digit  mul¬ 
tiplication  process  yields  a  multi-level  perfect  sequence  C  of 
period  N:  C'  =  (co,  ci ,  •  •  • ,  cjv— i )■ 

If  the  two  component  complex  sequences  A  =  {a>}  of  period 
L  =  4 N  and  sequence  B  =  j  Per*0<i  4  are  combined 
using  digit-by-digit  multiplication,  the  periodic  ACF  of  the 
resulting  composite  sequence  C  is  given  by 


MO  = 


'  M0.  1  =  0  mo<* 4 

-*M0,  1  =  1  mod  4 

i  — 0A(l ),  l  -  2  mod  4 

j'MO.  1  =  3  mod  4 


(4) 


Similarly,  if  the  complex  sequence  A  is  a  quasi-perfect  se¬ 
quence  of  period  L  =  4.N ,  where  N  is  an  odd  number, 
then  the  sequence  C  synthesised  has  a  perfect  ACF.  Let 
C'  =  (co,  ci ,  •  •  • ,  CN-i ),  then  C'  is  a  perfect  sequence  of  period 

N. 


IV.  Synthesis  Examples 

Firstly,  consider  the  ternary  m-sequence  obtained  using  the 
integer  level  transformation.  Given  the  values  p  =  3,  n  =  3 
and  pn  —  1  =  26,  a  perfect  sequence  of  length  13  can  be  ob¬ 
tained:  c'  =  (0,-1, 1,-1,  o,o, -l.o, -i,-i, -i.i.i).  ec  = 

(9,  0,0,  0,0,  0,0,  0,0,  0,0, 0,0) 

Secondly,  consider  a  complex  m-sequence  of  length  L  = 
53  _  1  —  124  generated  by  primitive  polynomial 

h(x)  =  x3  +  ix2  +  x  -  i  over  G2+i.  Then  a 

new  perfect  sequence  C'  can  then  be  derived  as  fol¬ 
lows:  C'  =  (0, 0,  —1,  —1, *,  1, M.  -1,1, 1.1.  t',0,-l,-i, 

»c>  =  (25,0,0,0,0,0,0,0,0,0,0,0, 
0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,0,  0,  0,0,  0,  0,  0,  0) 
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Abstract  —  Given  any  prime  p  =  1  mod4  and  any 
positive  integer  m,  a  class  of  balanced  quadriphase 
sequences  of  length  p171  —  1  with  near-ideal  periodic 
autocorrelation  properties  is  constructed.  The  quad¬ 
riphase  sequences  are  optimal  under  the  condition  of 
balanced  sequence  elements. 


I.  Introduction 

In  1977,  Lempel,  Cohn  and  Eastman  describe  a  class  of  bal¬ 
anced  binary  sequences  with  optimal  periodic  autocorrelation 
properties  [1],  Their  work  is  related  to  the  construction  of 
orthogonal  matrices  [2,  3J.  Given  any  odd  prime  p  and  any 
positive  integer  m,  a  balanced  (±1)  binary  sequence  of  length 
Pm  ~  1  whose  out-of-phase  autocorrelation  function  R(r)  sat¬ 
isfies  c(r)  =  +2  or  -2  for  (pm  -  l)/2  odd,  and  R(r)  =  0 
or  -4  for  ( pm  —  l)/2  even  is  obtained.  It  is  shown  that  ev¬ 
ery  balanced  binary  sequence  must  have  at  least  two  distinct 
out-of-phase  correlation  values  which  are  at  least  as  high  as 
those  obtained  by  Lempel  et  al.  It  is  in  this  sense  that  their 
sequences  are  optimal. 

In  this  paper  we  describe  a  generalization  of  balanced  binary 
sequences  to  quadriphase  sequences.  It  is  shown  that  for  any 
prime  p  =  1  modi  and  any  positive  integer  m,  a  class  of 
balanced  quadriphase  sequences  of  length  pm  -  1  with  near¬ 
ideal  periodic  autocorrelation  properties  can  be  constructed. 
The  quadriphase  sequences  obtained  are  also  optimal  under 
the  condition  of  balanced  sequence  elements. 


II.  Main  Result 

Consider  a  finite  field  F  =  GF(pm),  where  p  =  1  mod  4  and 
m  is  a  positive  integer:  let  G  denote  the  multiplicative  group 
of  F  and  a  be  any  primitive  element  of  F,  i.e.,  a  is  a  generator 
of  the  cyclic  group  G.  Consider  also  the  subset  Si  of  G  defined 
by 


5,  = 


1  =  1,2,3;  k  = 


(1) 


S4  =  G  \  {Si  U  S2  U  S3)  (2) 

Note  that  each  Si  contains  exactly  one  quarter  of  the  elements 
of  G  and  that  every  element  of  St  is  equal  to  some  power  of 
a. 

Let  /  denote  the  mapping  from  G  onto  {1,  *,  —1,  — *}  defined 
by 


/(«') 


(  1,  if  a1  €  Si 

I  i,  if  a*  €  S2 

|  -1,  if  or*  €  S3 

l  i>  if  £  S4 


0  <  t  <  =  pm  —  1 


(3) 


f(at)  =  il,  if  a‘ e  Si,  0  <  t  <  4k  —  pm  —  1  (4) 

Based  on  the  above,  we  can  prove  the  following  result: 

Theorem  1  The  periodic  autocorrelation  function  R(r)  of 
the  quadriphase  sequence  a  =  a0,  au  ■  ■  ■ ,  aik-i,  where  at  = 
/(«'))  0  <  t  <  4k  -  1,  satisfies  R( 0)  =  4 k  and,  for 
0  <  r  <  4k  —  1, 


(  0  or  ±  2  or  ±  2i  or  ±  2  ±  2 i,  k  =  odd 
R(r)  =  ^  0  or  ±  2  or  ±  2i  or  ±  2  =h  2 i  (5) 

{  or  ±  4  or  ±  4i  or  ±  4  ±  4 i,  k  =  even 

Moreover,  a  is  balanced  and  R(r )  is  optimal,  given  the  condi¬ 
tion  of  balance . 


III.  Examples 

Example  1:  p  =  13,  m  =  1,  k=  =  3,  and  a  =  2.  For 

this  set  of  parameters  we  obtain 

W  ■  1,2,4,8,3,6,12,11,9,5,10,7} 

{/(«')  ;  1,  -i,  1,  i,  -1,  — «',  i,  i,  1,  -1,  -l  } 

{R(r)  :  12,  —2,  —2,  2z,  —2  +  2 i,  2 i,  0,  —2 i,  —2  —  2i,  2 i, 

-2,-2} 

Example  2:  p  =  5,  m  =  2,  k  =  =  6,  and  a  =  x  =  (0, 1). 

For  this  set  of  parameters  we  obtain 

K  :  (1,0),  (0, 1),  (2,  2),  (4,  1),  (2, 1),  (2,  4),  (3,  0),  (0, 3), 

(1, 1),  (2,  3),  (1,  3),  (1,  2),  (4,  0),  (0,  4),  (3,  3),  (1,4), 
(3, 4),  (3, 1),  (2,  0),  (0,  2),  (4, 4),  (3,  2),  (4,  2),  (4,  3)  } 
{/(o')  :  i,  -i,  1, 1, 1,  -i,  i,  -it  it  i,  _1>  _2j  1;  _J-) 
-1,  i,  —1, 1,  i ,  —1,  —1  } 

{£(r)  :  24,  -2  +  2 i,  2 i,  -2  -  2i,  0,  -2  +  2 »,  0,  0, 

-2,  -2  +  2 1,  2 i,  0,  -4.,  0,  -2 1,  -2  -  2 i,  -2, 

0,  0,  -2  -  2i,  0,  -2  +  2 i,  -2 i,  -2  -  2 i  } 
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Abstract  —  The  use  of  quasi-linear  synchroniza¬ 
tion  (QLS)  codes  to  provide  synchronization  of  frames 
with  fixed  length  n  offers  many  advantages  relative 
to  comma-free  codes  and  prefix  synchronized  codes. 
Easy  frame  location  and  the  absence  of  data  conver¬ 
sion  enable  a  QLS-code  to  be  implemented  with  very 
low  complexity  independent  of  the  frame  length.  An¬ 
other  important  aspect  is  the  ability  of  error  control  in 
the  presence  of  substitution  errors.  A  list  of  optimal 
QLS-codes  of  length  up  to  40  obtained  with  elabo¬ 
rate  computer  search  is  presented.  Several  families  of 
perfect  and  (sub)  optimal  QLS-codes  with  large  word 
length  n  have  been  constructed,  and  also  new  upper 
bounds  on  the  redundancy  of  the  codes  have  been  es- 
tablished. 

I.  Introduction 

A  quasi-linear  synchronization  (QLS)  code  [1],  being  a  coset  of 
a  linear  code,  allows  easy  encoding  and  decoding,  easy  frame 
location,  and  error  control.  Consider  a  code  C  of  length  n, 
being  a  subset  of  Aq,  where  Aq  denotes  the  g- ary  alphabet 
{0, 1,  . . . ,  g  —  1}.  The  synchronization  and  error  control  prop¬ 
erties  of  a  code  C  C  A  q  are  determined  by  the  code  distance 
d(C)  (i.e.  the  minimal  Hamming  distance  of  the  code),  and  by 
the  code  separation  p(C ),  defined  by 

p(C)  =  min  d{Ti(X,Y),Z)  ,  (1) 

0<i<n 

x,Y,zec 

with  shift  operator  Tt(X,Y )  =  xixi+i  ...xn~ iyo$/i  -  •  •  2/t-i ^ 

The  code  C  C  Aq  is  called  a  quasi-linear  synchronization 
code  of  length  n  and  separation  p,  if  for  each  code  word  X  € 

C ,  a  fixed  set  P  of  positions  is  used,  establishing  separation 
p(C )  >  p  irrespective  of  the  actual  value  of  the  other  (data) 
positions.  In  this  way,  an  arbitrary  data  word  D  6  Aq  of 
length  m  =  n  — | P|  can  be  easily  inserted  at  the  data  positions. 
A  q- ary  QLS-code  of  length  n  and  separation  p  is  called  a 
QLS(g,  n,  p)  code. 

The  use  of  distinct  synchronization  positions  and  uncon¬ 
strained  data  positions  allows  easy  encoding  and  decoding  for 
any  separation.  QLS-codes  with  separation  p  >  1  can  be  used 
for  error  control  coding.  Correct  synchronization  and  error 
correction  can  be  guaranteed  in  the  presence  of  no  more  than 
t  substitution  errors  in  n  successive  symbols  for  a  code  C  with 
d(C)  >  (2 1  +  1)  and  p(C)  >  (2 1  +  1). 

II.  Bounds  and  Code  Constructions 
The  redundancy  R  of  a  g-ary  QLS-code  of  length  n  and 
separation  p  is,  according  to  Levenshtein  [l],  bounded  by 
R  >  Pmin(g,N,p),  with 


The  construction  of  g-ary  QLS-codes  with  arbitrary  code 
separation  is  in  general  difficult,  especially  the  construction  of 
QLS-codes  with  minimal  redundancy,  so  called  optimal  codes. 


Ellernstr.  29,  45326  Essen,  Germany 

Using  constructions  proposed  by  Levenshtein  [l],  optimal  bi¬ 
nary  QLS-codes  with  separation  p  <  2  and  redundancy  Rm\n 
can  always  be  obtained  for  any  length  n.  For  p  >  2,  two  new 
upper  bounds  on  the  redundancy  have  been  obtained  for  bi¬ 
nary  codes,  based  on  construction  methods.  Firstly,  a  binary 
QLS-code  can  be  constructed  with  redundancy  Pi(2,n,p), 
bounded  by  Pi  (2,  n,  p)  <  Pmin(2,  n,  p)+(p(p),  for  which  p-2  < 
<p(p)  <  3p  -  2.  The  term  tp(p)  is  independent  of  the  length 
ri,  therefore  the  constructed  codes  are  asymptotically  optimal 
for  7i  ->  oo  and  p/n  ->•  0.  Secondly,  for  77  >  p(2p2  +  2p  + 
1),  a  binary  QLS-code  can  be  constructed  with  redundancy 
P2(2,  n,  p),  bounded  by  ^2(2,  77,  p)  <  Pmin(2, 77,  p)  +  p  —  1. 

Several  search  methods  can  be  used  to  find  optimal  codes 
with  larger  separation  (p  >  2).  In  this  way  optimal  QLS-codes 
with  length  77  up  to  40  have  been  found. 

III.  Combinatorial  Construction  Methods 
The  theory  of  difference  sets  [3]  is  sometimes  applicable  for  the 
construction  of  QLS-codes.  It  is  convenient  to  use  the  follow¬ 
ing  combinatorial  description  of  a  g-ary  QLS-code  of  length  n 
and  separation  p.  The  index  position  set  P  is  partitioned  into 
q  subsets  Po,  Pi  ?  •  *  •  ,  Pq—i  in  such  a  manner  that  for  each  num¬ 
ber  d  ^  0  (mod  77)  there  are  at  least  p  pairs  (£;,  y3),  6  Pi, 

y3  G  Pj,  i  /  j,  satisfying  xt  —  y3  =  d  (mod  77).  If  there  are 
exactly  p  pairs  for  each  d,  the  code  is  called  perfect. 

It  is  apparently  very  difficult  to  construct  perfect  codes 
with  arbitrary  parameters  77  and  p.  Using  the  theory  of  dif¬ 
ference  sets,  two  families  of  perfect  QLS-codes  can  be  directly 
constructed.  Firstly,  for  length  77  =  At2  +  1,  t  being  an  odd 
positive  integer  and  n  being  prime,  perfect  QLS-codes  with 
separation  p  =  (fc2  +  l)/2  can  be  constructed.  Secondly,  for 
any  length  77,  77  prime,  perfect  QLS-codes  with  separation 
p  -  (77  -  i)/2  can  be  constructed.  The  redundancy  is  equal 
to  2 12  -1-  1  and  77  —  1  respectively.  Both  perfect  codes  turn 
out  to  be  unique,  i.e.  there  are  no  solutions  with  the  same 
parameters  which  do  not  belong  to  this  family.  It  is  expected 
that  several  other  families  of  perfect  codes  will  be  found  for 
binary  as  well  as  g-ary  codes. 

IV.  Conclusion 

It  has  been  shown  that  for  binary  QLS-codes  of  arbitrary 
length  and  separation  constructions  can  be  obtained  which 
are  close  to  optimal.  For  a  large  variety  of  QLS-codes,  es¬ 
pecially  for  the  important  category  of  small  codes,  optimal 
codes  have  been  found.  Using  combinatorial  methods,  various 
families  of  perfect  QLS-codes  have  been  obtained  as  well. 
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Abstract  Sonar  as  well  as  other  related  sequences 
were  introduced  by  Golomb  and  Taylor  in  [2].  Fol¬ 
lowing  a  similar  approach, we  introduce  the  concept 
of  an  extended  sonar  sequence.  It  is  similar  to  that 
of  a  sonar  sequence  but  blank  columns  are  permitted. 
Several  constructions  of  extended  sonars  are  given. 
Our  constructions  are  very  close  to  ordinary  sonar  se¬ 
quences.  However  they  provide  good  improvements 
to  the  list  of  the  best  known  constructions  for  sonar 
sequences  up  to  100  symbols  given  in  [3], 


wave)  time  slots,  then  we  would  call  it  an  (n.m.jfe)  extended 
sonar. 

The  case  when  k  =  0  is  that  of  a  sonar,  and  the  case  of 
n  —  1  reduces  to  what  has  been  studied  previously  under  the 
name  of  rulers,  which  have  other  applications  besides  radar 
and  sonar  to  synchronization,  crystallography,  etc  (see  [l]). 
In  other  words  extended  sonar  sequences  are  a  natural  gen¬ 
eralization  of  sonars  and  also  of  rulers.  The  main  point  of 
the  present  talk  is  to  give  several  constructions  of  extended 
sonars . 


I.  Introduction 

Sonar  sequences  were  introduced  in  [2]  to  deal  with  the 
following  problem:  “You  have  an  object  which  is  moving  to¬ 
wards  (or  away)  from  you,  and  you  want  to  know  effectively 
your  distance  and  velocity  of  the  object.” 

The  solution  to  the  problem  comes  from  using  the  Doppler 
effect:  when  a  wave  hits  a  moving  object  its  frequency  changes 
in  direct  proportion  to  the  velocity  of  the  object.  In  other 
words  you  send  a  wave,  wait  until  it  returns  and  from  the 
time  it  takes  you  know  the  distance,  from  the  new  frequency 
you  know  the  velocity.  On  the  other  hand  since  the  world  is 
noisy  you  might  send  out  a  wave  that  does  not  return.  Conse¬ 
quently  you  send  out  m  waves  with  frequencies  ranging  from 
1  to  n.  Waves  are  sent  out  at  times  ranging  from  1  to  m. 
Once  the  whole  pattern  of  waves  returns,  from  the  change  in 
frequency  you  determine  the  velocity  of  your  object  and  from 
the  time  change  the  distance.  On  the  other  hand  if  not  all  the 
frequencies  return  there  might  be  some  ambiguities  as  to  what 
is  the  whole  pattern.  Sonars  are  those  patterns  for  which  you 
send  out  exactly  one  wave  at  every  time  and  also  for  which 
even  if  only  two  waves  return  you  can  reconstruct  the  whole 
pattern.  This  last  point  means  that  there  is  no  ambiguity. 
The  problem  for  sonars  is,  given  n  frequencies,  construct  an 
n  by  m  sonar  sequence  for  m  as  large  as  possible. 

II.  Extended  Sonar  sequences 

The  point  of  this  talk  is  that  for  the  sonar  application  an 
alternative  to  sending  exactly  one  wave  at  every  time  (the 
sonar  case)  is  that  of  sending  at  most  one  wave,  or  in  other 
words,  choose  not  to  send  any  wave  in  some  time  slots.  This  is 
done  to  achieve  a  larger  number  of  waves  sent  for  a  given  num¬ 
ber  of  frequencies,  while  increasing  the  probability  of  receiving 
at  least  the  two  frequencies  needed  to  reconstruct  the  whole 
sequence.  Because  of  the  similarity  with  the  common  sonar 
sequences,  our  sequences  will  be  using  the  same  equipment,  in 
other  words,  it  will  be  more  cost  effective  than  common  sonar 
sequences.  Again  we  would  send  m  waves  with  frequencies 
ranging  from  lton,  and  let  us  say  that  there  are  k  blank  (no 
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III.  The  Constructions 

We  will  show  how  some  of  the  constructions  used  in  [3]  to 
generate  Costas  and  Sonar  sequences,  have  a  circular  periodic¬ 
ity  property  that  is  the  basis  of  our  constructions  of  extended 
sonar  sequences,  namely  the  Extended  Logarithmic  Welch,  the 
Extended  Shift  Sequence  and  the  Extended  Lempel-Golomb. 
This  three  constructions  with  k  =  1,  are  very  similar  to  ordi¬ 
nary  sonars  but  for  which  the  table  of  our  constructions  for 
n  up  to  100,  outperforms  the  corresponding  table  of  the  best 
known  construction  for  sonars  given  in  [3j.  For  example  for 

n  =  46  and  n  =  75  it  fills  7  more  slots  that  common  sonar 
sequences. 

Also  we  have  tested  the  performance  of  this  constructions 
comparing  them  with  the  best  possible  extended  sonar  se¬ 
quences  obtained  doing  an  extensive  search.  The  problem 
of  generating  extended  sonar  sequences  exhaustively  with  the 
computer  resides  in  the  fact  that  the  time  of  computation 
increases  exponentially.  The  only  practical  way  to  obtain  a 
sonar  or  extended  sonar  sequence  for  large  lengths  is  there¬ 
fore  by  generating  it  with  some  particular  construction.  At  the 
moment  we  have  done  the  extensive  search  for  up  to  m  =  10. 
The  constructions  obtained  the  best  possible  value  60%  of  the 
time. 

We  will  define  the  Circular  extended  sonar  sequences  and 
then  we  will  prove  that  the  Logarithmic  Welch,  the  Shift  Se¬ 
quences  and  the  Lempel-Golomb  constructions  give  us  a  cir¬ 
cular  extended  sonar  sequences.  We  will  show  then  that  from 
any  circular  extended  sonar  sequences,  we  can  obtain  n  ex¬ 
tended  sonar  sequences. 

Then  we  will  apply  a  series  of  transformations  to  the  re¬ 
sulting  extended  sonar  sequence  to  obtain  a  sequence  with 
a  reduced  number  of  symbols  obtaining  the  best  known  ex- 
tended  sonar  sequences. 
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Abstract  —  A  new  synchronization  code  construc¬ 
tion  technique  is  presented  which  uses  a  so  called  ex¬ 
tended  prefix  containing  positions  with  fixed  symbols 
and  unconstrained  data  positions,  followed  by  a  con¬ 
strained  data  sequence.  In  this  way  a  set  of  prefixes  is 
used  to  identify  the  frame,  instead  of  only  one  prefix, 
as  for  normal  prefix  synchronized  (PS)  codes.  This 
enlarges  the  code  size,  while  the  advantages  of  PS- 
codes,  i.e.  easy  frame  recognition  and  the  availability 
of  data  mapping  procedures,  are  maintained. 

I.  Introduction 

Synchronization  of  fixed  length  frames  can  be  performed  us¬ 
ing  comma-free  codes  [l].  Several  maximal  comma-free  codes 
can  be  constructed  [1],  but  both  frame  recognition  and  data 
mapping  tend  to  be  very  complex.  One  solution  is  to  use  so- 
called  prefix  synchronized  (PS)  codes,  introduced  by  Gilbert 
[2],  and  further  analyzed  by  Guibas  and  Odlyzko  [3].  A  PS- 
code  Cp(k  +  m)  is  defined  as  a  set  of  code  words  of  length 
n  =  k  +  m  with  q- ary  symbols  of  the  alphabet  Aq,  with  the 
property  that  for  any  code  word  pip2  •  •  -PkC\C2  . . .  cm  the  pre¬ 
fix  P  —  pip2  . .  .pk  does  not  appear  anywhere  in  the  sequence 
p2  •  •  •  pA: Ci  .  .  .Cmpi  .  •  -Pfc-1  • 

We  will  modify  the  marker  by  lifting  the  condition  to  use 
consecutive  fixed  symbols.  The  modified  marker  is  called  an 
extended  prefix.  After  a  formal  definition  of  extended  prefix 
synchronized  (EPS)  codes,  the  construction  of  extended  pre¬ 
fixes  will  be  described,  and  expressions  for  the  cardinality  will 
be  derived  in  order  to  compare  the  EPS-codes  with  PS-codes. 
Finally,  a  data  mapping  procedure  will  be  presented. 

II.  Code  description 

Prior  to  giving  an  exact  definition  of  prefix  synchronized 
codes,  the  correlation  between  two  sequences  will  be  defined. 
For  two  sequences  X  and  Y  of  length  n  the  correlation  X  over 
Y ,  denoted  by  X  o  Y,  is  a  binary  vector  r\V2  . . .  rn,  with  r*  is 
1  if  the  subsequence  a;  t  t + 1  .  • .  £n  equals  yij/2  .  *  ♦  Vn—i ,  and  0 
otherwise. 

For  <7- ary  PS-codes  with  q  <  4,  prefixes  P  of  size  k  with 
correlation  PoP  =  10k_1  maximize  the  code  set  [3].  These  PS- 
codes  have  the  following  form:  Cp[k  +  m)  —  P  Tp(rn),  where 
Pp(m)  denotes  the  set  of  constrained  sequences  c\  ...cm  in 
which  P  does  not  appear  as  a  subsequence. 

As  an  example  of  extended  prefixes,  let  us  consider  the  set 
of  two  patterns  11000  and  11010,  denoted  by  110*0.  The  code 
set  C\  10*0  (5  T  77i )  is  the  union  of  the  sets  11000  T\  10*0  (m)  and 
llOlOJFuo.o(m),  where  ^110*0 (fra)  =  ^7hooo(tti)  G  ^iioio(ro)- 
We  notice  that  each  code  word  in  Cno*o(5  +  m)  belongs  to  a 
PS-code  of  prefix  11000  or  to  a  PS-code  of  prefix  11010.  The 
advantage  of  this  extended  marker  is  to  obtain  one  uncon¬ 
strained  position  (fourth  position)  while  the  disadvantage  is 
to  force  the  remaining  part  of  each  code  word  to  be  more  con¬ 
strained,  since  neither  11000  nor  11010  are  allowed  to  occur 
in  the  constrained  sequence.  The  cardinality  of  Cno*o(5  +  m) 


is  equal  to  2\  Fnooo(m)  G  1010  (to)  |.  According  to  Gilbert 
[2],  prefixes  of  length  4,  e.g.  1100,  have  the  maximal  code  size 
among  PS-codes  of  length  11  upto  21.  In  this  range  the  in¬ 
equality  |Cuo*o(n)|  >  |Cuoo(n)|  always  holds,  e.g.  for  n  =  13, 
EPS-code  £110*0  (13)  with  384  code  words  is  17.8%  larger  than 
PS-code  £noo(13)  with  326  code  words. 

An  extended  prefix  synchronization  code  uses  an  extended 
marker  V  of  length  h  with  k  fixed  positions  and  h-k  uncon¬ 
strained  data  positions.  In  fact,  P  is  a  set  of  q  different 
prefixes.  For  every  pair  of  prefixes  Pi ,  Pj  G  V  the  correlation 
Pi  0  P3  is  equal  to  10h_1  if  i  =  j,  and  0h  otherwise.  In  this 
case  the  code  Cv(h  +  m)  with  extended  prefix  V  is  defined  by 

Cp(h  +  m)  =  V  Tv{m)  =  |^J  <  Pi  <  ^A(m) 

PiGV  l  l  Pt€V 


We  will  show  that  for  each  Pi  G  V ,  Cv(h  +  m)  is  a  PS-code 
with  prefix  Pi.  The  cardinality  of  an  EPS-code,  Cv(h  +  m), 
equals  qh~k  Fv(m),  where  Fv(m)  denotes  the  size  of  Tv{rn). 

Theorem  1  An  extended  prefix  synchronized  code  Cp(h  +  rn) 
with  extended  marker  P  of  length  h  and  k  fixed  positions ,  has 
generating  function 


_ 1 _ 

1  —  qz  +  qh~kzh  ’ 


(1) 


which  provides  the  following  recursive  formula  for  Fv(m): 


0  <  m  <  h 

q  Fv(m  —  1)  —  qh~k Fv(m  —  h)  m  >  h. 

We  found  that,  for  given  fc,  binary  EPS-codes  having  an 
extended  prefix  of  the  form  P  =  ll0  (Y~l  0)k~l  1  with  t  = 
\k/2]  have  maximal  cardinality.  Mapping  procedures  have 
been  developed  which  associate  each  number  x  in  the  range 
0  <  x  <  Fv(m)  with  a  unique  word  of  the  set  Tp(m)  and  vice 
versa. 


III.  Conclusion 

A  new,  so  called  extended  prefix  synchronized  code  has  been 
presented,  as  well  as  methods  to  construct  extended  prefixes, 
an  expression  to  exactly  determine  the  cardinality  of  an  arbi¬ 
trary  <7- ary  EPS-code,  and  a  mapping  procedure  to  generate 
codes  with  maximal  code  size.  EPS-codes  allow  easy  frame  de¬ 
tection  and  have  a  coding  complexity  which  is  roughly  equiv¬ 
alent  to  PS-codes,  and  prove  to  have  a  larger  code  size  com¬ 
pared  to  the  traditional  PS-codes. 
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Abstract  A  systematic  procedure  for  mapping 
data  sequences  into  code  words  of  a  binary  maximal 
prefix  synchronized  (MPS)  code  as  well  for  the  in¬ 
verse  mapping  is  presented.  The  complexity  of  the 
proposed  scheme  is  proportional  to  the  code  word 
length.  In  order  to  be  able  to  choose  another  pre¬ 
fix,  e.g.  a  Barker  sequence,  methods  will  be  presented 
which  convert  an  MPS  code  into  other  MPS  code  with 
a  different  prefix.  Both  the  mapping  algorithm  and 
the  conversion  algorithm  can  be  generalized  for  ?-ary 
prefix  synchronized  codes. 

I.  Introduction 

A  prefix-synchronized  (PS)  code,  introduced  by  Gilbert  [l]  and 
further  analyzed  by  Guibas  and  Odlyzko[2],  is  a  collection  of 
code  words  of  length  k  +  n  over  an  alphabet  Aq  of  size  q  whose 
first  k  symbols  equal  the  prefix  P  =  pxp2  .  m.Pkl  and  in  addi¬ 
tion,  any  code  word  px  ...pkcx  . . .  cn  satisfies  the  constraint 
that  P  does  not  appear  as  a  block  of  k  consecutive  symbols 
anywhere  in  p2  .  ..phcx  . .  .  cnpx  i .  Let  0<f+n)  be  a  max¬ 

imal  PS  (MPS)  code  which  maximizes  the  code  size  among  all 
PS  codes  with  the  same  parameters  n  and  P  of  length  k. 

The  advantage  of  PS  codes  relative  to  maximal  comma- 
free  codes  [3]  is  easy  word  synchronization  recovery,  since  the 
decoder  only  has  to  search  for  the  appearance  of  P  in  the  in¬ 
coming  stream  symbols.  In  this  paper,  we  propose  simple  en¬ 
coding  and  decoding  algorithms  for  an  MPS  code  £^+n)  with 
s elf- un correla ted  prefix  P  such  that  no  prefix  of  P  matches 
any  suffix  of  P. 

II.  Recursive  Subdivision  of  MPS-codes 

For  a  prefix  P  of  length  k  >  1,  let  T™  denote  the  set  of 
sequences  of  length  n  such  that  no  P  appears  at  any  position 
as  a  string  of  k  consecutive  symbols.  The  autocorrelation  of  a 
sequence  X  —  xxx2  .  .  .x-m  of  length  m  is  defined  as  a  binary 
sequence  Y  of  length  m,  satisfying  the  property  that  yi  =  1  if 
the  prefix  xx  . . .  xm_I+1  of  X  and  the  suffix  xt  . .  .  xm  of  X  are 
identical,  otherwise  zero.  The  autocorrelation  of  X  is  denoted 
by  X  o  X.  As  an  example,  if  P  =  1110,  then  P  o  P  =  1000. 
Note  that  Po  ?  =  lO^1  iff  P  is  self- uncorrelated.  For  two 
strings  X  and  Y,  let  XY  be  the  concatenation  of  X  and  Y. 
Moreover,  for  a  string  X  and  a  set  of  strings  T,  let  XT  be 
{ A  1  |1  £  -T}-  The  followings  are  main  results  we  obtained. 
Theorem  1  For  a  prefix  P  of  length  k,  P^n)  is  an  MPS 
code  Q(p+n^  if  and  only  if  P  is  selfiuncorrelated. 

Theorem  2  Let  PG  be  lfc_10.  Then  it  follows  that 

A?a  =  Un}  u  U  {I’^O^")}  ■ 

1=1 

Note  that  the  recursion  on  the  size  of  T^J  obtained  from 
theorem  2  gives  us  a  strictly  larger  code  than  Mandelbaum’s 
code  [4]  in  which  Fibonacci  recursions  [5]  are  used. 


III.  Mapping  Algorithm  for  MPS-codes 

Theorem  2  shows  the  division  of  T^J  into  k  distinct  subsets. 
By  recursively  applying  the  theorem  to  each  subset  except 
the  singleton  set  (consisting  of  only  one  element),  T^  can  be 
represented  as  a  collection  of  Gk,n  singleton  sets  where  Gk)U 
is  the  cardinality  of  We  present  an  algorithm  to  find  a 

singleton  set  corresponding  to  an  input  x  with  0  <  x  <  Gk,n 
and  the  inverse  algorithm  as  well.  The  scheme  is  based  on 
enumerative  coding  [6]. 

IV.  MPS-code  Conversion 

In  practical  situations,  one  might  use  another  prefix  than 
PG  =  1  1q>  e.g.  a  Barker  sequence[7].  We  show  an  algo¬ 

rithm  to  transform  a  code  word  in  Q^+n)  to  another  in  £^+n) 

if  Q  o  Q  =  10*'1  holds:  Scanning  V  £  Q^n)  from  the  left 
to  the  right,  replace  Q  with  PG  when  Q  is  found  on  V .  Then 
the  sequence  obtained  must  belong  to  £^*+n)  if  the  last  sym¬ 
bol  qk  of  Q  is  0.  In  case  qk  =  1,  negating  V  and  replacing 
Pg  with  the  negation  of  PG,  the  above  conversion  procedure 
transforms  V  £  &(p+n)  to  a  code  word  in  Q{k+7l) . 

V.  Conclusion 

In  conclusion  of  his  paper  [4],  Mandelbaum  makes  comment 
on  his  codes  as  follows:  “These  codes  seem  to  have  the  best 
efficiency  of  all  comma-free  codes  that  can  be  constructed  sys¬ 
tematically  (no  table  lookup).”  We  disprove  his  statement.  A 
systematic  procedure  for  mapping  data  sequences  into  code 
words  of  a  binary  MPS  code  of  prefix  PG  as  well  for  the  in¬ 
verse  mapping  is  presented.  The  complexity  of  the  proposed 
scheme  is  proportional  to  the  code  word  length  n .  In  order 
to  enable  the  choice  of  another  prefix,  methods  are  presented 
which  transform  Q{*+n)  into  any  £^+n)  of  self-uncorrelated 
prefix  Q.  The  mapping  procedure  and  the  conversion  method 
can  be  generalized  for  q- ary  prefix  synchronized  codes. 
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Abstract  —  Complex  valued  sequences  of  length  n 
are  considered.  A  sequence  is  said  to  be  a  perfect  se¬ 
quence  if  all  the  out-of-phase  periodic  autocorrelation 
coefficients  are  equal  to  zero.  A  sequence  is  said  to  be 
a  phase  shift  keyed  (PSK)  sequence  if  all  the  coordi¬ 
nates  are  on  the  unit  circle.  A  sequence  is  said  to  be 
a  polyphase  sequence  if  all  the  coordinates  are  n’th 
roots  of  unity.  For  the  case  n  is  a  power  of  a  prime 
integer,  the  partial  classification  of  perfect  PSK' se¬ 
quences  is  given.  As  a  consequence,  the  full  classifica¬ 
tion  of  one  dimensional  bent  functions  is  presented. 

I.  Introduction 

Let  x  =  (xo,  xi, . . . ,  xn-i)  be  a  complex  valued  sequence  of 
length  n  containing  at  least  one  non-zero  component.  The  pe¬ 
riodic  cross-correlation  function  of  sequences  x  and  y  is  given 
by  flx.y(r)  =  IT=0  x’Vt+r^  T  =  0,1,..., n-  1,  where  all 
the  indices  are  calculated  mod  n  and  x*  denotes  the  com¬ 
plex  conjugation  of  x.  The  periodic  autocorrelation  function 
(PAF)  of  x  is  defined  by  Rx(r)  :=  Rx,x(r).  The  ’’energy”  of 
the  sequence  x  is  given  by  i?x( 0)  =  lXsl2  >  0*  • 

Consider  a  set  of  sequences  M  =  {xi,x2, . . . ,  xm}, 

where  x,  =  (xj,o,  x*,i, . . . ,  Xj,„_i).  Let  1Z(M)  := 

{|Acilxi(r)|,  r  =  0, 1, . . .  ,n  -  1,  ij  =  1,2, . . . ,  Af }  be  the 
set  of  absolute  values  of  periodic  auto  and  cross  correlation 
coefficients. 

The  following  simultaneous  linear  transformations  of  the 
x;’s  do  not  change  the  set  1)  Projectivity:  z i  = 

oxj,  i  —  0,1,2,...  ,M,  where  |a|  =  1  is  a  complex  number 
on  the  unit  circle.  2)  Cyclic  shift:  Zij  ~  Xjj-fi,  i  = 

0, 1, 2, ... ,  M;  j  =  0, 1, 2, . . . ,  n  —  1.  3)  Permutation  group: 

zi,j  ==  xi,kj  (mod  n))  *  =  1,  2,  .  .  .  ,  Af )  j  —  0,1,...,  Tl  1, 

where  gcd(/c,n)  =  1.  4)  ’’Linear  frequency  modulation”: 

Zij  =  Xij CSJ,  i  =  0, 1, 2, . . . ,  M\  j  =  0, 1, 2, . . . ,  n  —  1,  where  s 
is  an  integer,  £  is  a  primitive  root  of  unity  of  degree  n.  5)  Con¬ 
jugation:  Zij  =  x  *j ,  i  =  0,1,2,...,  M ;  j  =  0, 1, 2, . . . ,  n  -  1. 

We  refer  to  sequences  z  and  x  as  equivalent  sequences  if 
the  sequence  z  can  be  obtained  from  the  sequence  x  using  a 
number  of  these  transformations. 

II.  Perfect  Sequences 

Let  P  =  (1  /y/n)  [£tJ]  ,  i,j  =  0, 1, . . .  ,  n  —  1  be  the  matrix  of 
the  Discrete  Fourier  Transform  (DFT)  of  dimension  n.  Let 
y  =xP  be  the  DFT  of  a  sequence  x  =  (xo,£i, . . .  ,xn_i). 
Theorem  1  [1]  A  sequence  x  is  a  perfect  sequence  if  and 
only  if  all  the  Fourier  components ,  i.e.,  the  components  of  y, 
have  the  same  magnitude  . 

Theorem  1  gives  the  complete  description  of  the  set  of  all  gen¬ 
eral  perfect  sequences.  An  important  problem  in  the  theory  of 
perfect  sequences  is  finding  non-equivalent  sequences  or  find-  [3] 
ing  the  dimension  of  a  set  of  perfect  sequences. 


Theorem  2  [1]  dimPn  =  n  —  1,  where  Vn  denotes  the  set  of 
non- equivalent  perfect  sequences  of  energy  n. 

Theorem  3  [2]  The  number  of  non- equivalent  perfect  PSK 
sequences  of  length  n  —  p\p2  . .  -  Pm,  where  pi  fs  are  distinct 
primes,  is  finite. 

The  situation  is  quite  different  when  n  is  not  square  free.  In 
this  case,  there  are  infinitely  many  non-equivalent  perfect  PSK 
sequences. 

Theorem  4  The  maximal  dimension  of  a  set  of  perfect  PSK 
sequences  of  lenght  n  =  p2m  or  n  =  p2rn+1  is  equal  to  k  — 
pm  —  1.  Such  sets  can  be  constructed  in  the  explicit  form. 

III.  Perfect  Polyphase  sequences 
Polyphase  sequences  is  a  special  subset  of  PSK  sequences. 

An  index  function  f(x)  is  known  as  a  one- dimensional 
bent  function  if  and  only  if  the  corresponding  sequence 
x  =  •  *  j  perfect 

The  number  of  different  bent  functions  is  finite.  A  general 
construction  of  bent  functions  is  given  in  [3].  Nevertheless, 
this  construction  does  not  describe  all  the  bent  functions.  We 
give  the  full  classification  of  bent  functions. 

Theorem  5  Ifn~  2m,  m  odd,  then  a  bent  function  does  not 
exist. 

Theorem  6  Let  n  —  p,  p  >3  is  a  prime.  All  the  bent  func¬ 
tions  are  quadratic  polynomials  f{x)  =  ax 2  4-  bx  4-  c,  a,  6,  c  G 
Zp,  a^0. 

Theorem  7  Let  n  =  p2k .  Let  xo  and  x\  be  the  unique  repre¬ 
sentation  of  x  given  by  x  =  x 0  +xipk ,  where  0  <  xo  <  p*  —  1, 
0  <  xi  <  pk  —  1.  Then  all  the  bent  functions  of  length  n  are 
given  by 

f(x)  =  F{x 0)  4-  x\G{xo)pk,  (1) 

where  F(xo)  is  a  function  taking  values  in  Z„  and  G(x 0)  is 
a  function  taking  values  in  Zpt  such  that  G(a)  /  G(b),  if 
a  ^  b,  a,  b  C  Zp* . 

For  the  case  n  =  p2k+1 ,  there  exists  a  similar  theorem. 
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I.  Introduction 

Until  recently,  most  known  decoding  procedures  for  error- 
correcting  codes  were  based  either  on  algebraically  calculating 
the  error  pattern  or  on  some  sort  of  tree  or  trellis  search.  With 
the  advent  of  turbo  coding  [1],  a  third  decoding  principle  has 
finally  had  its  breakthrough:  iterative  decoding . 

(Iterative  decoding  is  not  a  new  idea,  though:  most  of 
the  key  ideas  were  already  present  in  Gallager’s  work  on  low- 
density  parity-check  codes  [2].) 

With  respect  to  Viterbi  decoding,  a  code  is  most  naturally 
described  by  means  of  a  trellis  diagram.  The  main  thesis  of 
the  present  paper  is  that,  with  respect  to  iterative  decoding, 
the  natural  way  of  describing  a  code  is  by  means  of  a  Tanner 
graph  [3],  which  may  be  viewed  as  a  generalized  trellis.  More 
precisely,  it  is  the  “time  axis”  of  a  trellis  that  is  generalized 
to  a  Tanner  graph. 

Trellises  yield  Tanner  graphs  of  the  type  shown  in  Fig.  1; 
in  particular,  the  graph  has  no  cycles.  The  complexity  re¬ 
duction  in  turbo  codes  (and  low-density  parity-check  codes, 
and  many  new  codes  to  be  discovered)  comes  from  allowing 
Tanner  graphs  with  cycles,  cf.  Fig.  2. 

II.  Decoding 

Both  Viterbi  decoding  and  BCJR  decoding  [4]  are  easily 
generalized  to  arbitrary  Tanner  graphs  without  cycles,  where 
these  algorithms  are  still  optimal  (in  the  same  sense  as  for 
trellises).  The  basic  idea  of  iterative  decoding  is  simply  to 
apply  these  algorithms  even  to  Tanner  graphs  with  cycles, 
ignoring  the  fact  that  the  algorithms  are  no  longer  optimal. 
The  empirical  success  of  turbo  coding  (as  well  as  our  own 
experiments  with  other  types  of  codes)  confirm  the  validity  of 
this  approach. 

Of  course,  analytical  understanding  of  the  decoder  opera¬ 
tion  is  also  desirable.  Our  main  result  here  applies  to  “cy¬ 
cle  codes”  (a  subclass  of  low-density  parity-check  codes):  we 
give  a  complete  algebraic  characterization  of  all  error  patterns 
that  are  corrected  by  the  generalized  Viterbi  algorithm  after 
infinitely  many  iterations. 

III.  Realization  Theory  on  General  Graphs 

Much  recent  work  was  devoted  to  finding,  and  bounding  the 
size  of,  the  “smallest”  trellis  for  a  given  code.  This  problem  is 
significantly  generalized  by  considering  general  Tanner  graphs. 

In  the  traditional  setting,  the  only  degree  of  freedom  (for 
a  given  code)  was  the  ordering  of  the  “time  axis”.  For  a 
given  ordering,  every  linear  code  has  a  well-defined  unique 
minimal  trellis,  and  every  other  trellis  for  the  same  code  may 
be  collapsed  to  the  minimal  trellis  by  state  merging. 

In  our  more  general  setting,  the  “time  axis”  need  not  be  or¬ 
dered,  but  may  be  an  arbitrary  Tanner  graph.  Even  for  a  fixed 


Tanner  graph,  there  is,  in  general,  no  unique  minimal  trellis. 
(The  simplest  example  are  tail-biting  trellises.)  Nevertheless, 
bounds  on  the  “size”  of  the  realization  may  be  obtained  from 
the  (“abstract”)  state  spaces  of  the  code. 

IV.  A  Priori  Probabilities 

Our  careful  derivation  of  the  two  basic  iterative  decoding 
algorithms  clarifies,  in  particular,  what  a  priori  distributions 
are  admissible  and  how  they  are  properly  dealt  with.  As  it 
turns  out,  these  distributions  are  closely  related  to  Markov 
Random  fields. 
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Abstract  —  We  describe  a  concatenated  coding  system  with 
iterated  sequential  inner  decoding.  The  system  uses  convolutional 
codes  of  very  long  constraint  length  and  operates  on  iterations  be¬ 
tween  an  inner  Fano  decoder  and  an  outer  Reed-Solomon  decoder. 

I.  INTRODUCTION 

We  consider  a  concatenated  system  with  a  convolutional  inner  code, 
a  block  interleaver  of  degree  I,  and  I  outer  RS  codes  of  the  same 
length,  but  with  different  redundancies/error  correcting  capabilities. 
After  encoding  by  the  outer  codes  and  interleaving,  the  frame  is  split 
into  a  number  of  subframes.  These  are  encoded  by  a  memory  M  con¬ 
volutional  code,  which  is  terminated  by  M  input  zeroes.  The  decoding 
for  the  inner  code  is  performed  by  a  number  of  sequential  Fano  deco¬ 
ders,  which  perform  forward  and  backward  (i.  e.  starting  from  the  end 
of  the  subframe)  decoding  simultaneously,  and  on  all  the  subframes 
in  parallel.  The  process  is  monitored  such  that  decoded  symbols  from 
the  inner  code  in  each  of  the  RS  words  are  counted.  The  non-decoded 
symbols  are  treated  as  erasures. 

In  a  chosen  implementation  we  use  1=8,  the  error  correctional 
profile  for  the  outer  code  [16  50  1666166  6],  and  M=23.  The  three 
first  decoders,  and  the  6th  RS  decoders  are  errors-and-erasure  decod¬ 
ers  which  can  correct  e  erasures  as  long  as  e  +  2t  ^  100  and  e  ^  68, 
respectively  e  +  2t  ^  32  and  e  ^  16.  The  other  RS  words  are  errors- 
only  decoders. 

The  first  RS  decoding  attempt  is  then  performed  on  the  second 
RS  word  when  187  decoded  symbols  in  this  word  are  available  from 
the  inner  decoders,  and  in  case  of  a  decoding  failure  (more  than  50 
errors  detected)  a  new  attempt  is  performed  each  time  a  new  decoded 
symbol  is  available  from  the  inner  decoders.  When  the  word  is  de¬ 
coded  the  result  is  fed  back  and  used  to  guide  the  sequential  decoders 
in  the  continued  decoding,  i.e.  the  sequential  decoders  are  forced  to 
follow  paths  in  the  tree  which  agree  with  the  RS  decoded  data.  De¬ 
coding  more  and  more  RS  words  and  feeding  the  results  back  to  the 
inner  decoders  will  in  this  way  iterate  the  process  towards  a  succesful 
decoding  of  the  full  frame.  If  3  consequetive  RS  words  (24  bits  >M) 
are  decoded,  the  forced  inner  decoding  will  effectively  split  the  full 
frame  into  sub-sub-frames  of  length  48  bits,  which  can  be  decoded 
independently  in  both  directions.  If  the  decoding  in  one  of  these  is 
stuck,  a  jump  to  the  next  sub-sub-frame  can  be  made  with  only  a 
small  penalty. 

II.  Results  for  Iterated  Sequential  Decoding 

In  a  system  where  sequential  decoding  is  used  the  code  should  have 
a  good  (or  optimum)  distance  profile  (DP)  together,  of  course,  with  a 
large  free  distance.  However,  if  decoding  is  performed  forward  and 
backward  on  a  frame  (or  subframe)  a  suitable  code  must  also  have  a 
good  distance  profile  in  its  reversed  form,  since  this  is  the  code  used 
in  the  backward  decoding.  From  a  code  search  we  obtained  the  fol¬ 
lowing  ODP  memory  M  =  23,  df=54  code  written  in  hexadecimal 
form 

G  =  [  96A77B  B7EA67  D0A25D  E1C4D9  ] 

which  also  has  a  very  good  DP  in  reversed  form. 

In  the  simulations  we  have  used  an  AWGN  channel  quantized 
into  16  levels,  and  the  quantizing  thresholds  are  Es/N0-dependent  as 
is  common  practice.  The  Fano  decoders  use  an  ordinary  FANO  met¬ 
ric,  which  in  our  case  is  unquantized  (32  bit  floating-point  words).  As 


a  preliminary  value  of  A  we  have  used  the  ratio  A/bmmax  =  6,  where 
brnmax  ts  maximum  branch  metric.  We  have  chosen  to  use  inter¬ 
leaving  degree  I  =  8,  but  a  further  gain  may  be  available  by  increasing 
the  interleaving  degree.  A  good  choise  for  the  number  of  subframes 
was  determined  through  simulations  to  be  15.  The  choice  of  profile 
(and  rate)  for  the  outer  codes  is  by  no  means  obvious  and  deserves 
further  investigation.  Our  simulation  approach  includes  a  number  of 
different  profiles,  and  the  one  chosen  here,  has  proven  the  best  re¬ 
sults.  Very  good  profiles  with  only  two  different  outer  codes  does 
also  exist. 

In  Figure  1  we  have  shown  the  average  number  of  computations 
Cav  found  by  simulation  runs  of  1000  frames  (a  total  of  14,368,000 
information  bits)  with  different  signal  to  noise  ratios.  No  errors  appe¬ 
ared  at  all.  The  Eb/NQ  values  specified  are  the  net  values  for  the  entire 
system.  Since  the  overall  rate  of  the  system  is  Roverall  =  0.216  (includ¬ 
ing  the  small  loss  introduced  by  the  termination  of  subframes)  we 
notice  that  the  inner  sequential  decoder  operates  at  an  Es/N0  which  is 
6.66  dB  below  Eb/N0.  With  a  computational  cut-off  rate  ^.omp  that 
falls  below  the  convolutional  code  rate  of  1/4  for  Eb/NQ  <  2.55  dB 
these  results  support  our  claim  that  a  sequential  decoder  can  operate 
well  above  Rcomp  if  some  kind  of  side  information  is  available.  In  this 
case  the  side  information  is  achieved  by  using  an  outer  code. 

We  notice  that  for  Eb/NQ  =  1 .0  dB  we  can  build  a  decoder  with  a 
Cav  that  is  at  least  100  times  smaller  than  the  16,384  decoding  opera¬ 
tions  used  by  the  Viterbi  decoder  for  the  n=4,  M=14  code  used  in  the 
Galileo  mission.  For  Eb/N0  =  1 .0  dB  no  frame  required  more  than 
2000  computations  per  bit  and  very  few  frames  required  more  than 
500  computations  per  bit.  When  we  consider  the  Eb/N0  =  0.6  dB  case 
3%  of  the  frames  requires  more  than  10,000  computations,  and  2%  of 
the  frames  requires  more  than  the  Galileo  code  Viterbi  decoder. 

III.  Conclusion 

We  have  described  a  very  efficient  scheme  utilizing  iterated  sequen¬ 
tial  decoding.  However,  the  number  of  computations  depends  on  the 
profile  chosen  and  on  the  strategy  used  for  the  inner  decoding,  but  as 
demonstated,  very  good  results  can  be  obtained. 


Figure  1 :  Average  number  of  computations  per  bit  as  function  of  Eb/N0. 
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I.  Introduction 

The  number  of  iterations  of  an  iterative  optimal  or  sub- 
optimal  decoding  scheme  [1-5]  for  binary  linear  block  codes 
without  any  effect  on  its  error  performance  can  be  reduced  by 
testing  a  sufficient  condition  on  the  optimality  of  a  candidate 
codeword.  In  this  paper,  the  least  stringent  sufficient  condi¬ 
tion  on  the  optimality  of  a  decoded  codeword  is  investigated 
under  the  assumption  that  the  available  information  on  the 
code  is  restricted  to  (1)  the  minimum  weight  or  the  distance 
profile  and  (2)  for  a  given  positive  integer  h,  h  or  fewer  already 
generated  candidate  codewords.  The  least  stringent  sufficient 
conditions  of  optimality  for  1  <  h  <  3  are  presented.  Condi  is 
the  same  as  the  one  given  in  [2],  Cond 2  is  less  stringent  than 
the  one  given  in  [3],  and  Condi  and  Cond2  are  derived  from 
Conds  as  special  cases.  These  conditions  can  be  used  effec¬ 
tively  to  save  computer  simulation  time  for  evaluating  error 
probability  for  maximum  likelihood  decoding. 

As  examples,  we  consider  Chase  Algorithm  II  [1]  and  two 
iterative  decoding  algorithms  [5,6]  for  RM5)],  RM5>2,  RM6j2, 
and  RM6>3,  where  RMm_r  denotes  the  r-th  order  Reed-Muller 
codes  of  length  2m.  Majority-logic  decoding  with  randomly 
breaking  ties  is  used  to  generate  candidate  codewords.  For 
an  AWGN  channel  and  BPSK,  the  effectiveness  of  Condh  for 
1  <  h  <  3  is  evaluated  by  simulation. 

II.  Sufficient  Conditions  on  the  Optimality  of 
a  Decoded  Codeword 

Suppose  a  binary  block  code  C  of  length  N  with  distance 
profile  W  is  used  for  error  control  over  the  AWGN  channel  us¬ 
ing  BPSK.  A  codeword  c  is  mapped  into  x  6  {—1, 1}^.  Sup¬ 
pose  *  is  transmitted  and  r  is  received  sequence  at  the  output 
of  matched  filter  of  the  receiver.  Let  2  =  (21,22,  •«  ■  ,zn)  be 
the  binary  hard-decision  sequence  obtained  from  r. 

Let  Vn  denote  the  set  of  all  binary  N-tuples.  For  u  — 
(wi,U2,  •  • . , un)  in  Vn,  D\(u)=:{i  :  m  ^  Zi ,  and  1  < 

*  <  A>(u)={l,2,...,iV}  -  Di{u),  n(u)^\Di(u)\,  and 

L(u)=  SieDUu)  lr*l-  For  MLD,  the  decoder  finds  the  optimal 
codeword  copt,  for  which  L(copt)  =  minC€c  L(c).  If  there  ex¬ 
ists  c*  E  C  for  which  L(c*)  <  a(c*  )=  minc<rc,c^c*  L(c)  then 
c*  =  copt.  If  it  is  possible  to  determine  a  tight  lower  bound 
on  o(c+),  we  have  a  sufficient  condition  on  the  optimality  of 
a  candidate  codeword. 

For  simplicity,  assume  that  the  bit  positions  are  ordered 
with  |n  |  <  \r2\  <  **  ■  <  |nv|.  For  a  subset  X  of  {1,2, . . .  ,1V} 
and  a  positive  integer  j  <  \X\,  let  denote  the  set  of  j 

smallest  integers  in  X .  For}  >  0,  X^=(f>  (empty  set)  and  for 

j  >  l-^l,  X^=X.  Let  <£//(*,*)  denote  the  Hamming  distance. 

For  h  >  0,  let  Bh  denote  the  set  of  binary  sequences 
of  length  h.  For  a  €  Bh  and  1  <  i  <  h,  let  pr{a 
denote  the  i-th  bit  of  a.  For  Uk  and  u  in 

Vn ,  Da — n«i  Dpria('Wi),  ~\Dq, |  and  qa  —  \Di (u)  n  Da |. 
For  d2,d2,...,  dh  in  W  -  {0},  VNtdlid2i.„tdh  ={u  6  VN  : 
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9115400  and  BCS-91 15400,  NASA  Grant  NAG  5-931  and  the  Min¬ 
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dH{u,Ui)  >  di  for  1  <  i  <  h}.  Then,  u  €  VN,di,d2,...,dn  if 
and  only  if 

(-l)pr,Q:<?a  >  Si^di  -  n(ui),  for  1  <  i  <  h.  (1) 

a£Bh 

Let  Q  denote  the  set  of  those  2h -tuples  over  nonnegative 
integers  which  satisfy  (1).  We  say,  q  =  (900-0,90-01,..., 
qn  —  1)  £  Q  is  minimal  if  and  only  if  there  is  no  q  = 
(9oo-o>9o-oi>-**»0u-i)  such  that  q  ^  q‘  and  qa  >  qa  for 
any  a  in  Bh .  Let  Qmin  denote  the  set  of  minimal  tuples  in  Q. 
Then 


min  L(u) 

UtVN,dx  ,d2 . dh 


min 

<?€Qm  in 


E 


•eU„ 


(2) 


Example:  Let  h  =  2.  It  is  proved  in  [4]  that 


min  L(u ) 

u^VN.d1  ,d2 


E 

i€<I>ooUl?‘L(5l-,52,/2J}^1) 


(3) 


For  —  U2,  that  is,  h  =  1  and  d\  —  d2  =  the  minimum 
distance,  equality  (3)  reduces  to  the  formula  given  in  Theorem 
1  in  [2]. 

For  tii  9^  u2,  the  right-hand  side  of  (3)  is  shown  to  be 
tighter  than  the  lower  bound  EigCo(uj)(  r«i+*a>/Jl  M  in  I3)- 
For  h  —  3,  we  derive  a  formula  for  minu€Yvdl  d2  d3  L(u) 
in  [4]. 

III.  Simulation  Result 

Simulation  results  show  that  Cond2  is  effective  in  all  cases. 
Condz  is  slightly  more  effective  than  Cond2.  The  effectiveness 
of  Cond2  over  Condi  is  relatively  small  for  RMe,2  and  RM6.3 
The  details  are  shown  in  [4,6]. 
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Abstract  — A  MAP  decoding  algorithm  is  described  that  can 
greatly  speed  up  computer  simulations  of  turbo  coding  schemes 
and  which  allows  the  practical  implementation  of  turbo  codes  in 
their  most  powerful  form. 

I.  Introduction 

The  discovery  of  Turbo  codes  and  the  claim  that  they  can  perform 
within  0.7  dB  of  Shannon  capacity  for  1  bit/sym  [1]  has  generated 
considerable  interest  within  the  coding  community.  The  heart  of  an 
iterative  decoding  algorithm  for  Turbo  codes  described  in  [  1  ]  is  the  use 
of  a  Maximum  a  Posteriori  (MAP)  decoding  algorithm  derived  from 

[2] ,  This  MAP  decoding  algorithm  is  extremely  complicated  and 
greatly  limits  the  decoding  speedpossible  (since  two  MAP  decoders  are 
required  in  each  iteration  stage  of  the  Turbo  decoding  algorithm,  which 
may  be  up  to  18  stages). 

A  great  simplification  of  the  MAP  decoding  algorithm  is  given  in 

[3] .  For  a  rate  1/n  systematic  convolutional  code  with  memory  v  (and 
M  =  2V  states)  this  algorithm  involves  4M  additions,  6M+2n—  1 
multiplications,  one  division,  and  n  exponentials  (for  an  additive  white 
Gaussian  noise  channel)  per  decoded  bit.  By  taking  the  —  logarithm  of 
the  algorithm  (the  logarithm  is  used  in  [3])  we  can  convert  the 
multiplications,  divisions,  and  exponentials  to  additions  and 
subtractions  only  (the  exponentials  conveniently  disappear).  However 
the  addition  operand  becomes  the  E  operand  defined  below: 

x  E  y  =  -  ln(e“x  +  e_y).  (1) 

We  can  simplify  (1)  to 

x  E  y  =  min(x,y)  —  ln(l  +  e"ly_x|).  (2) 

The  E  operand  is  then  reduced  to  finding  the  minimum  of  x  and  y  and 
a  function  dependent  only  on  the  difference  between  x  and  y. 

We  can  see  that  the  maximum  of  f(z)  -  ln(  1+ e  _z),  z  >  0,  is  small, 
equal  to  ln(2)  —  0.693.  With  increasing  z,  f(z)  quickly  decays  to  near 
zero  for  z  >  7.  In  a  computer  simulation  z  can  be  quantised  to  some 
maximum  value  and  a  look  up  table  used  to  find  f(z).  This  greatly 
speeds  up  the  MAP  decoding  algorithm  with  almost  no  degradation  in 
performance.  This  technique  can  also  be  used  in  a  hardware 
implementation  of  the  MAP  algorithm  using  very  small  look  up  tables. 

II.  A  MAP  Decoding  Algorithm 

The  log  likelihood  ratio  of  a  transmitted  information  bit  at  time  k 
(di<)  can  be  shown  to  be  [3]  (the  division  in  [3]  should  actually  be  a 
subtraction) 

M-l  M-l 

L(dk)  =  E)Ak(m)  +  Bk(m)  “  Fj/  Ak(m)  +  Bk(mX  (3) 

m  —  0  m  =  0 

where  we  define 

M-l 

£g(m)  =  g(0)  E  g(l)  E  •  E  g(M  -  1).  (4) 

m=0 

Ak(m)  and  Bk(m)  can  be  computed  iteratively  as 

1 

Aj.(m)  =  (5) 

j=0 
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B^m)  =  j?Bi+1(S|<m))  +  Dj(Rk+  „  S^m)),  (6) 

j=0 

where  S[(m)  and  S  ^(m)  is  the  state  you  go  to  from  state  m  along  the  path 
dk  =  i  forwards  and  backwards  in  time,  respectively.  Rk  is  the  received 
length  n  vector  at  time  k.  The  branch  metric  Di(Rk,m)  is  defined  for  a 
rate  1/2  code  as 

Di(Rk,m)  =  -  ^  (xki  +  yicYXm)),  (7) 

where  Ric  =  (xk,yk),Xk  =  (2dk-  l)+Pk,yk  =  (2Yk-  l)+qk,Pk  and  qk 
are  two  independent  normally  distributed  random  variables  with 
variance  a2,  Yk  is  the  coded  bit  at  time  k,  and  Yi(m)  is  the  coded  bit  for 
state  m  and  dk  —  i. 

For  a  length  N  coded  sequence  (starting  and  finishing  in  state  0)  the 
algorithm  follows  these  steps: 

1 )  Starting  at  time  k  =  0,  compute  Dj  (Rk,m)  using  (7)  for  all  received 
symbols  and  store  in  an  array  of  size  2nN. 

2)  Initialise  B^  „  x  (S^(0)) = 0  for  i  =  0, 1  and  B{^  _  l  (m)  =  °c  for  all  other 
m  and  i.  Starting  at  time  k=N— 2,  iteratively  compute  Bk(m)  using 
(6)  and  store  in  an  array  of  size  MN  (since  Bk(m)  =  Bk(m')  where 
Sf(m)  =  S°(m')  we  can  reduce  the  array  size  by  half). 

3)  Initialise  Aq(0)  =  D^Rq,  0)  for  i  =  0,1  and  A[,(m)  =  °c  for  all  m  ^ 
0  and  i  =  0,1.  Starting  at  time  k  =  1,  iteratively  compute  A[(m) 
using  (5).  For  each  k  compute  L(dk)  using  (3). 

The  “state  metrics”  Ak(m)  and  Bk(m)  need  to  be  renormalised  after 
each  iteration  to  prevent  the  metrics  from  overflowing.  This  is  achieved 
by  subtracting  the  smallest  state  metric  at  each  k  (previously  this  would 
have  been  done  by  division). 

A  serial  MAP  decoder  is  being  designed  which  uses  a  modi  fled  form 
of  the  above  algorithm.  The  received  samples  are  quantised  into  six  bits 
with  eight  bit  state  metrics.  We  have  N  =  216/M  where  M  is 
programmable  from 4  to  5 12  states.  The  decoder  is  able  to  decode  any 
systematic  code  with  rates  from  1/2  to  1/4.  Limiting  is  used  to  prevent 
overflow  and  small  64x4  lookup  tables  are  used  to  implement  f(z).  The 
maximum  bit  rate  is  1 07/(M+ 1 4)  bit/  s(19to556  kbit/s  for  M  =  4  to  5 1 2 
states,  respectively). 

Four  Xilinx  XC3100A  programmable  gate  arrays  are  being  used, 
together  with  several  64Kx4  static  RAMs  and  1Kx8  dual  port  static 
RAMs.  With  an  additional  XC3100A  chip,  two  MAP  decoders  with 
depth  64K  random  interleaving  can  be  implemented  on  a  single  board 
as  one  stage  of  a  turbo  decoder  (the  inner  and  outer  code  must  be  the 
same).  Each  board  can  have  its  data  fed  back  or  passed  onto  another 
board  depending  on  the  required  speed/complexity  requirements. 

References 

[1]  Berrou,  C„  Glavieux,  A.,  and  Thitimajshima,  P.,  “Near  Shannon 
limit  error-correcting  coding  and  decoding:  Turbo-Codes,” 
ICC' 93,  Geneva,  Switzerland,  pp.  1064—1070,  May  1993. 

[2]  Bahl,  L.,  Cocke,  J.,  Jelinek,  F.,  andRaviv,  J.,  “Optimal  decoding 
of  linear  codes  for  minimizing  symbol  error  rate,”  IEEE  Trans. 
Inform.  Theory ,  vol.  IT— 20,  pp.  284—287,  Mar.  1974. 

[3]  S.  S.  Pietrobon  and  A.  S.  Barbulescu,  “A  simplification  of  the 
modified  Bahl  decoding  algorithm  for  systematic  convolutional 
codes  flnt.  Symp.  Inform.  Theory  &  itsApplic. ,  Sydney,  Australia, 
pp.  1073-1077,  Nov.  1994. 


471 


On  the  convergence  of  the  iterated  decoding  algorithm 

Giuseppe  Caire,  Giorgio  Taricco,  and  Ezio  Biglieri1 

Dipartimento  di  Elettronica  •  Polite cnico  •  Corso  Duca  degli  Abruzzi  24  •  I- 10 129  Torino  (Italy) 
fax;  +39  11  5644099  •  e-mail:  <name>@polito.  it  • 


Abstract  —  Recently,  the  concept  of  iterated  decoding 

of  concatenated  codes  has  been  developed  .  Success¬ 
ful  applications  of  this  concept  include  turbo-codes  and 
soft-decoding  of  product  codes.  After  stating  the  iter¬ 
ated  decoding  algorithm  formally,  we  provide  a  con¬ 
jecture  on  the  convergence  and  the  asymptotic  opti¬ 
mality  of  this  algorithm. 

I.  Introduction 

Consider  the  following  decoding  strategy  for  block  codes  over 
a  memoryless  channel  [1,  4],  The  receiver  observes  the  “un¬ 
constrained  a  posteriori”  (UAP)  distribution  Q,  and  feeds  it 
to  the  decoder  to  obtain  a  new  distribution  P  which  satisfies 
the  code  constraints  and  minimizes  the  directed  divergence 
(or  cross-entropy)  D(P  ||  Q)  between  Q  and  P.  If  Ic  denotes 
the  set  of  constraints  introduced  by  code  C ,  we  write 

P  =  QoIc  (1) 

to  describe  the  operation  of  the  decoder. 

Decoding  by  cross-entropy  minimization  consists  of  com¬ 
puting  P  given  Q  (i.e.,  given  the  observed  channel  output 
and  the  knowledge  of  the  channel  transition  distribution), 
and  then  selecting  the  code  word  x  which  maximizes  P. 
A  “soft-output  ML  decoder”  can  be  thought  of  as  a  device 
performing  operation  (1)  [1],  Standard  variational-calculus 
techniques  provide  the  solution 


Q  o  Ic  =  Q(x)Ic(x) 


n  -1 


y^Q(z) 


ixec 


(2) 


outputs  code  word  xo  if  this  is  the  unique  code  word  x  €  C 
such  that  (x,  y )  are  jointly  typical,  otherwise  it  outputs  an 
error  message. 

Assume  that  we  receive  a  typical  y  (if  the  received  se¬ 
quence  is  not  typical  we  are  done,  since  in  any  case  we  have 
an  error).  The  number  of  x  code  words  jointly  typical  with  y 
is,  on  the  average,  2nmx\y\  where  H{X\y)  denotes  the  con¬ 
ditional  entropy  rate  of  x  given  y.  These  are  the  words  “of 
high  probability”  when  y  is  given,  and,  by  the  asymptotic 
equipartition  property,  they  have  roughly  the  same  proba¬ 
bility  ~  2~mQ\  where  H(Q)  denotes  the  entropy  of  the  UAP 
distribution  Q.  Let  A(y)  denote  the  set  of  those  sequences, 
and  Ax^{y)  the  set  of  the  sequences  x  jointly  typical  with  y 
under  the  new  conditional  distribution  P l’(  obtained  at  the 
f-th  step  of  the  iteration  (i  =  1  or  2).  The  three  parts  of  the 
conjecture  are  as  follows: 

1.  IfCiHAiy)  i-  0,  then  |A1,:L(2/)|  <  |A(y)|.  This  is  equivalent 
to  saying  that,  if  some  sequences  x  e  Ci  are  typical  under 
distribution  Q,  then  H(Q)  >  H(P1,1). 

2.  Suppose  that  both  A(y)n  Ci  and  A(y)nC2  are  not  empty. 
Then 

lA^yJnCil  ~  \A(y)  n  Ci\ 

|A1’1(y)nC2|  <  \A{y)  nC2| 

In  other  words,  the  typical  sequences  in  Ci  under  Q  are  still 
typical  under  P1,1.  Hence,  since  the  size  of  the  set  must  de¬ 
crease,  we  must  delete  some  sequences  from  A(y)  which  are 
not  in  Ci*  In  particular,  we  can  throw  away  some  sequences 
of  C2  (which  are  not  also  in  Ci). 


II.  Iterated  decoding 

Consider  binary  block  codes  which  can  be  described  as  the 
intersection  of  two  supercodes,  i.e.  C  =  Ci  nC2  (two-fold  prod¬ 
uct  codes  and  the  turbo-codes  can  be  expressed  in  this  way.) 
The  iterated  decoding  schemes  proposed  for  these  codes  [2, 3] 
can  be  formally  described  as  follows. 

Given  a  distribution  P,  let  P  denote  the  distribution  ob¬ 
tained  as  the  product  of  the  marginals  of  P.  Let  Q  denote 
as  usual  the  UAP  distribution.  Then  let  P2,0  =  Q,  and  for 
£  =  1, 2,  3, . . iterated  decoding  consists  of  the  sequence  of 
minimization  problems 

pl>*  =  P2’*-1  o  Ici  P2,i  =  Pl,i  o  Ic2.  (3) 

III.  Some  conjectures 

We  conjecture  that  iterated  decoding  is  equivalent  to 
typical-set  decoding.  It  is  known  that  typical  set  decoding  is 
asymptotically  optimal,  in  the  sense  that  it  achieves  channel 
capacity.  Given  the  received  word  y ,  the  typical-set  decoder 

1This  research  was  sponsored  by  the  Italian  National  Research 
Council  (CNR)  under  “Progetto  Finalizzato  Trasporti.” 


3.  If  C  is  a  good  code,  there  will  be  at  most  one  sequence 
xo  G  C  jointly  typical  with  y  (otherwise  we  would  have  an  er¬ 
ror  in  any  case).  If  A(y)nC  =  {zo},  then  A1,00(y)  -  A2,c*(y)  = 
{xo}  and  the  iterations  converge  to  the  distribution 

pi, CO  =  p2,oo  =  /io(x) 
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Abstract  —  We  present  an  iterative  soft-output  de¬ 
coding  algorithm  for  serially  concatenated  convolu¬ 
tional  codes  which  has  better  performance  than  the 
conventional  noniterative  decoding  algorithm.  The 
proposed  decoding  scheme  can  be  used  whenever 
some  form  of  serial  concatenation  of  encoders  and 
channels  with  memory  is  applied. 

The  figure  shows  the  block  diagram  for  a  serially  concate¬ 
nated  coding  system  with  iterative  soft-output  Viterbi  decod¬ 
ing.  The  binary  data  sequence  {a*}  is  fed  into  the  outer  en¬ 
coder.  The  binary  sequence  { bj  }  at  the  output  of  this  encoder 
is  interleaved  to  result  in  {c*}.  This  sequence  is  then  serially 
encoded  by  the  inner  encoder  into  the  sequence  {d*}.  The  se¬ 
quence  {<£j}  is  sent  over  a  Gaussian  channel  and  produces  at 
its  output  the  noisy  sequence  { yi },  where  y\  —  di  +  n\.  The 
{rif}  denotes  an  additive  white  Gaussian  noise  sequence  with 
zero  mean  and  variance  <x2. 

In  the  first  stage  of  the  m-th  iteration  of  the  decoding  al¬ 
gorithm,  soft  information,  A^m),  k  =  ...,-1,0, 1,...,  associ¬ 
ated  with  the  estimated  symbols  k  =  ...,—1,0,1,..., 

are  computed  by  the  simplified  version  of  the  SOVA  [1]  taking 
into  account  the  intrinsic  contributions  of  the  outer  decoder 
from  the  previous  iterations.  This  algorithm  is  referred  to  as 
the  inner  SOVA.  Let  71  denote  the  trellis  which  represents 
the  structure  of  the  inner  encoder.  The  metric  adopted  by 
this  stage  is  the  Euclidean  metric 

E( » -  f<)2  -  D2**  -  i)E(f(n)Aln)>> 

1  k  71 

where  Xk  and  &  are  respectively  the  inputs  and  noiseless  out¬ 
puts  in  an  arbitrary  path  with  the  same  position  as  yt  in  Ti 
and  n  goes  over  all  previous  iterations.  For  a  given  SNR,  the 
positive  parameters  are  arbitrary  and  should  be  chosen  to 
minimize  the  bit-error-probability  of  the  sequence  {a*}  at  the 
end  of  the  iterative  decoding  process.  are  the  intrinsic 

contributions  of  the  outer  SOVA  as  defined  below.  The  soft- 
information  variables  A^  represent,  up  to  a  multiplicative 
factor,  an  approximation  of  the  log-likelihood  ratios 

P(ck  =  “0”|{y,}) 

P{Ck  =  “1”| {Vl})  ■ 

The  sequence  of  estimated  symbols  and  the  associated  reliabil¬ 
ity  information  are  deinterleaved  using  the  reverse  procedure 
of  the  block  interleaver.  The  resulting  sequences  are  denoted 
by  {^m)}  and  respectively. 

The  outer  decoder  uses  also  a  simplified  SOVA.  It  applies 
the  structure  of  the  outer  encoder  trellis  to  the  sequence  de¬ 
livered  by  the  inner  decoder.  Therefore,  it  provides  enhanced 

1  Author  to  whom  correspondence  may  be  addressed.  E.  Pap¬ 
proth  was  supported  by  a  DAAD-fellowship  HSP  II  financed  by  the 
German  Federal  Ministry  for  Research  and  Technology  (BMFT). 


soft  outputs  for  the  sequence  received  from  the  inner  decoder 
f  associated  with  the  new  estimated  symbols  V'™'^ .  This  al¬ 
gorithm  is  referred  to  as  the  outer  SOVA.  The  metric  adopted 
by  this  stage  is  the  simple  correlation  metric 

-  Em  -  i)(2^m)  -  i)f<ra), 

i 

where  f3j  £  {“0”,  “1”}  is  an  arbitrary  path  symbol  with  the 
same  position  as  in  the  outer  encoder  trellis  T0.  Once 

again,  the  enhanced  soft- information  variables  f ^  represent, 
up  to  a  multiplicative  factor,  an  approximation  of  the  log- 
likelihood  ratios 

pp>i  =  “oi{^r)},{f<r>» 

This  second  stage  exploits  the  fact  that  the  sequence 
when  correct,  must  be  a  codeword  sequence  of  the  outer  en¬ 
coder.  Denote  respectively  by  and  {A^m*}  the  inter¬ 
leaved  versions  of  the  sequences  and  {fy*^}.  The  in¬ 

trinsic  contribution  of  the  outer  SOVA  in  comparison  with  the 
inner  one  is  measured  by  the  difference 

4m)  =  (24m)  -  l)A(r>  -  (2 £<„"*>  - 

For  the  following  iteration,  the  new  soft-information  variables 
generated  by  the  inner  SOVA  and  its  associated  decisions  are 
denoted  by  Aj™+D  1  > ,  respectively. 

At  the  final  /- th  iteration  of  the  iterative  decoding  process, 
an  outer  conventional  Viterbi  decoder  delivers  the  estimated 
sequence,  {di},  of  the  information  sequence.  The  metric  used 
by  this  decoder  is  —  —  l)(2&^  —  l)t^K 
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Abstract  —  This  paper  presents  a  suboptimum  soft- 
decision  decoding  scheme  for  binary  linear  block  codes 
based  on  an  iterative  search  algorithm  using  a  purged 
trellis  diagram.  The  scheme  achieves  near  optimum 
error  performance  with  a  significant  reduction  in  com¬ 
putation  complexity. 

I.  Summary 

The  proposed  scheme  uses  a  hard-decision  decoder  to 
produce  a  candidate  codeword  and  exploits  the  fact  that  the 
hard-decision  decoded  codeword  is  either  the  optimum  maxi¬ 
mum  likelihood  decoding  (MLD)  solution  or  at  a  distance  not 
too  far  away  from  the  optimum  MLD  solution.  As  a  result,  the 
optimum  MLD  solution  may  be  found  by  searching  through 
those  codewords  that  are  close  to  the  candidate  codeword. 
The  search  is  conducted  through  a  purged  trellis  diagram 
for  the  given  code.  If  the  optimum  MLD  solution  is  not  found, 
a  new  candidate  codeword  is  generated  by  using  a  new  test 
error  pattern  to  modify  the  hard-decision  received  sequence. 
Then  optimality  test  is  repeated  and  a  new  search  begins. 
Generation  of  new  candidate  codewords  and  searches  repeat 
until  either  the  optimum  MLD  solution  is  found  or  the  decod¬ 
ing  process  is  terminated  by  exhausting  all  possible  test  error 
patterns. 

Sufficient  conditions  for  optimality  are  proved.  Upper 
bounds  on  the  Hamming  distance  between  a  hard-decision  de¬ 
coded  candidate  codeword  and  the  optimum  MLD  solution  are 
derived  [1].  These  upper  bounds  define  a  search  region  for 
the  optimum  MLD  solution.  The  proposed  decoding  scheme  is 
simulated  for  some  well  known  codes.  The  simulation  results 
show  that  the  proposed  decoding  scheme  achieves  either  prac¬ 
tically  optimum  performance  or  a  performance  only  a  fraction 
of  a  dB  away  from  MLD  with  a  significant  reduction  in  de¬ 
coding  complexity  compared  with  the  Viterbi  decoding  based 
on  the  full  trellis  diagrams  of  the  codes  [3]. 

II.  Examples 

Consider  the  (23,12,7)  Golay  code.  The  proposed  decod¬ 
ing  algorithm  achieves  practically  optimum  performance.  At 
SNR  =  4  dB,  the  proposed  decoding  algorithm  achieves  the 
bit-error-rate  (BER)  of  10-3  and  requires  less  than  100  bi¬ 
nary  operations  (including  additions  and  comparisons  for  the 
purged  trellis  search).  The  average  number  of  iterations  re¬ 
quired  to  decode  a  received  sequence  at  SNR  =  4  dB  is  0.9.  At 
SNR  =  6  dB,  the  proposed  decoding  algorithm  achieves  BER 
of  10~6  with  average  number  of  binary  operations  less  than  15 
and  the  average  number  of  iterations  required  to  decode  a  re¬ 
ceived  sequence  is  0.2.  However,  the  optimal  Viterbi  decoding 
algorithm  based  on  the  full  trellis  diagram  of  the  code  requires 
a  fixed  number  of  2,559  binary  operations.  The  most  efficient 


optimum  decoding  algorithm  for  the  (24,12,8)  extended  Go- 
lay  code  proposed  so  far  requires  at  least  550  but  no  more 
than  651  binary  operations  [2].  For  SNR  greater  than  3  dB, 
the  proposed  decoding  algorithm  requires  much  less  computa¬ 
tions  than  that  of  the  optimum  decoding  proposed  in  [2]. 

Next,  we  consider  the  (32,21,6)  extended  primitive 
BCH  code.  The  proposed  decoding  algorithm  again  achieves 
practically  optimum  error  performance.  It  achieves  the  BER 
of  10”5  at  SNR  =  5.4  dB.  At  this  SNR,  the  average  num¬ 
ber  of  binary  operations  and  the  average  number  of  iterations 
required  to  decode  a  received  codeword  are  25  and  0.5  re¬ 
spectively,  whereas  the  optimum  Viterbi  decoding  based  on 
the  full  trellis  diagram  of  the  code  would  require  30,156  bi¬ 
nary  operations.  Even  for  the  worst  case,  the  proposed  de¬ 
coding  algorithm  requires  a  maximum  of  no  more  than  6,350 
binary  operations.  We  see  that  for  the  (32,21,6)  extended 
BCH  code,  the  proposed  decoding  algorithm  achieves  prac¬ 
tically  optimum  performance  with  a  significant  reduction  in 
computation  complexity. 

Using  the  proposed  algorithm  to  decode  the  (64,45,8) 
extended  BCH  code,  there  is  a  0.5  dB  loss  in  coding  gain  at 
the  BER  10“5  compared  with  the  optimum  MLD.  The  SNR 
required  to  achieve  BER  10“5  is  5.3  dB.  At  this  SNR,  the  av¬ 
erage  number  of  binary  operations  and  the  average  number  of 
iterations  required  to  decode  a  received  word  are  300  and  0.7 
respectively.  However,  optimum  Viterbi  decoding  based  on 
the  full  trellis  diagram  of  the  code  requires  4,301,823  binary 
operations  (this  number  can  be  reduced  by  certain  permuta¬ 
tions  of  the  order  of  bits  in  the  trellis).  Even  for  the  worst 
case,  the  proposed  decoding  algorithm  requires  only  a  maxi¬ 
mum  of  57,182  binary  operations.  We  see  that  a  tremendous 
reduction  in  computation  complexity  is  achieved  with  only  a 
small  performance  degradation. 
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Concatenated  codes  are  a  powerful  means  for  improving 
the  performance  of  digital  communication  systems  over  ex¬ 
tremely  noisy  channels.  Such  a  code  contains  an  inner  code, 
which  is  often  a  convolutional  code,  and  an  outer  code.  Since 
the  conventional  inner  decoder  only  gives  hard  outputs  (i.e., 
0  or  1  for  binary  codes),  the  outer  decoder  is  forced  to  work 
in  the  HDD  (hard  decision  decoding)  fashion.  Reliability  in¬ 
formation  is  needed  to  fully  utilize  the  SDD  capacity  of  the 
outer  code.  Even  in  cases  where  no  practical  SDD  algorithms 
exist  for  the  outer  codes  (e.g.,  Reed-Solomon  codes),  reliabil¬ 
ity  information  can  help  to  erase  highly  unreliable  bits,  and 
the  performance  can  be  improved  through  errors  and  erasures 
decoding  [1].  A  decoder  capable  of  delivering  such  reliability 
information  is  called  a  soft  output  decoder. 

The  reliability  measure  for  a  decoded  symbol  is  the  prob¬ 
ability  Pc  that  the  symbol  is  correct  or  the  probability  of  er¬ 
ror  Pe  =  1  —  Pc-  Such  quantities  can  be  obtained  by  the 
symbol-by-symbol  MAP  (maximum  a  posteriori  probability) 
algorithm.  Unfortunately  this  algorithm  is  computationally 
inefficient.  A  soft  output  Viterbi  algorithm  (SOVA)  [2]  can 
provide  an  estimate  of  Pe  which  is  accurate  only  for  large 
SNR. 

This  paper  proposes  an  efficient  modified  MAP  algorithm 
for  obtaining  Pc  for  the  outputs  of  convolutional  inner  de¬ 
coders.  The  outer  decoder  uses  Pc  to  perform  SDD  by  choos¬ 
ing  a  codeword  y  =  (y0,  2/i, . . . ,  yi-i)  which  maximizes  the 

maximum  likelihood  (ML)  metric  pp( y)  =  J2i=o  ln  Msh')* 
where  i n(yi)  is  the  probability  Pc  that  symbol  yi  is  correct. 
Decoding  based  on  this  ML  metric  is  referred  to  as  Gener¬ 
alized  SDD  since  it  includes  the  Euclidean  metric  on  AWGN 
channels  and  the  one  proposed  in  [2]  for  binary  memoryless 
channels  as  special  cases.  The  following  theoretical  and  im¬ 
plementation  issues  are  also  investigated: 

Convergence.  Practical  decoders  have  to  have  finite  de¬ 
coding  delay  (decoding  depth)  T  which  causes  truncation  er¬ 
rors  for  very  long  or  infinite  data  streams.  By  a  matrix  formu¬ 
lation  of  the  MAP  algorithm,  Pc  can  be  seen  to  be  a  function 
of  the  products  of  infinite  random  matrices.  The  theory  of 
products  of  random  matrices  (PRM)  is  then  used  to  show 
that  the  estimated  Pc  over  a  finite  T,  Pc( r),  converges  to  Pc 
exponentially  fast  with  F.  Thus  the  truncation  error  can  be 
made  arbitrarily  small. 

Complexity.  The  VA  is  the  most  efficient  hard  output 
convolutional  decoder.  A  soft  output  decoder  is  expected  to 
have  more  complexity  because  it  extracts  more  information 
from  the  inputs.  It  is  informative  and  of  practical  significance 
to  use  the  Viterbi  decoder  as  a  complexity  measure  against 
other  decoders.  It  is  demonstrated  that  the  MAP  soft  output 
decoder  has  a  complexity  of  T  T  1  times  that  of  the  Viterbi 
decoder. 

Decoding  delay.  The  F  required  for  a  fixed  level  of  ac¬ 
curacy  changes  bit  by  bit,  as  well  as  with  channel  conditions 
such  as  SNR.  A  fixed  delay  would  have  to  be  very  long  to 
accommodate  the  worst  case,  which  increases  the  complexity 
and  is  unnecessary  most  of  the  time.  Using  the  PRM  theory, 


Fig.  1:  Average  decoding  delay 
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Fig.  2:  Comparison  of  Pe  and  Pe 


the  problem  of  obtaining  Pc  can  be  reformulated  as  a  prob¬ 
lem  of  best  fit  between  two  vectors.  Solutions  of  the  best  fit 
problem  provide  a  scheme  which  keeps  T  at  a  minimum  for 
required  precision.  The  scheme  makes  the  modified  MAP  al¬ 
gorithm  very  efficient.  Figure  1  shows  that  the  average  delay 
F  versus  SNR  for  a  rate-1/2  code  with  constraint  length  3.  T 
is  kept  below  5  over  the  entire  operating  region  of  the  code 
and  rapidly  drops  to  zero.  The  algorithm  is  as  efficient  as  the 
VA  for  SNR  as  low  as  3  dB. 

Range  overflow.  This  phenomena  was  shown  to  occur 
very  easily  during  the  decoding  process.  To  solve  this  problem, 
it  is  shown  that  the  relative  amplitudes  among  the  quantities 
in  consideration  are  bounded,  thus  a  very  simple  and  effective 
scaling  scheme  is  constructed. 

Finally,  a  comparison  is  made  between  the  exact  Pe  pro¬ 
vided  by  the  modified  MAP  algorithm  and  the  approximation 
Pe  in  [1,  2].  It  is  shown  that  the  approximation  gives  an  op¬ 
timistic  estimate  of  Pe,  especially  for  low  SNR.  The  result  is 
plotted  in  Figure  2  for  50,000  consecutive  bits  at  SNR  =  2 
dB,  using  the  same  code  mentioned  above.  The  discrepancy 
becomes  smaller  for  increasing  SNR.. 
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We  consider  the  performance  of  quadratically  detected  het¬ 
erodyned  lightwave  signals  in  the  presence  of  Laser’s  relax¬ 
ation  oscillations.  Here  the  limiting  form  of  the  channel 
presents  both  additive  white  Gaussian  noise  (AWGN)  of  spec¬ 
tral  density  N0  and  Laser’s  phase  noise.  The  widely  accepted 
model  for  the  Laser’s  phase  noise  is  a  Brownian  Motion  giving 
rise  to  a  Lorentzian  line  spectrum  [l] .  However,  the  actual  line 
shape  of  semi-conductor  lasers  deviates  from  this  simplified 
and  idealized  statistical  characterization  [2].  The  analytical 
techniques  provided  here  show  that  this  deviation  may  have  a 
signficiant  impact  on  the  communication-system  performance. 

The  major  difference  stems  from  the  Laser’s  relaxation  os¬ 
cillations,  which  induce  periodic  satellite  peaks  in  the  line 
spectrum.  The  resulting  phase  noise  is  characterized  by  a 
normalized  zero-mean  Gaussian  process  with  autocovariance 
function 

IE <t>t<t>s  =  min(f ,  s) -f Real  .  (1) 

The  term  min(t,s)  gives  rise  to  a  perfect  Lorentzian  spec¬ 
trum.  The  second  term  presents  a  deviation  from  the  Brow¬ 
nian  Motion  phase  model.  Here  £r  =  {’kBitr)~1  where  B\ 
is  the  underlying  “Brownian”  linewidth,  tr  is  the  decaying 
time-constant  and  ur  =  Qr/ttB£  where  CIr  is  the  resonance 
angular  frequency.  B  is  a  complex  constant  depending  on  the 
laser’s  parameters. 

The  receiver  in  focus  here  comprises  Z-fold  square-law  de¬ 
tection  of  Z  filtered  noisy  phase  frequency  shift  keying  (FSK) 
signals  observed  in  AWGN.  The  underlying  decision  statis¬ 
tics  relies  on  Laser’s  phase  noise  via  normalized  exponential 
functionals  of  the  form: 


f  e3^4"  ds 

2 

> 

r,  =  f  e3^4”  e 3('s  ds 

J  O 

Jo 

where  et  crrepsonds  to  the  inband  received  signal  and  Ft  ren¬ 
ders  a  crosstalk  signal.  Here  8  is  the  normalized  frequency 
spacing  between  the  FSK  signals. 

The  joint  statistics  inherited  by  the  phase  noise  function¬ 
als  (2),  is  unknown,  even  for  the  simplified  Brownian  Motion 
Model  [1].  Assuming  Z-fold  diversity  reception  ( Z  statistically 
independent  and  identically  distributed  observed  noisy  phase 
signals)  which  can  be  achieved  via  interleaving  techniques, 
power  moment  statistical  characterization  of  et  and  Ft  is  a 
useful  approach.  Indeed  in  the  case  of  a  Brownian  Motion, 
tight  Bit  Error  Rate  (BER)  bounds  are  achievable  with  the 
use  of  a  few  moments,  for  optimized  system  parameters  [3]. 

Following  [3],  the  application  of  the  theory  of  limiting  val¬ 
ues  of  integrals,  the  Holder-Inequality  and  the  Chernoff  bound 
yield  upper  bounds  on  the  bit  error  probability  (BER).  The 
bounds  are  given  in  terms  of  the  power  moments  induced 
by  the  phase  noise  functionals  et  and  rt,  the  computation 
of  which  is  required.  Noting  that  the  joint  power  moments 


featured  by  the  Markovian  Brownian  Motion  functionals  are 
analytically  tractable,  [3]  the  considered  power  moments  of  et 
and  rf  are  related  to  certain  mutual  moments  governed  by  a 
Brownian  Motion.  For  illustration,  the  first-order  moment  of 
e  t  at  i  —  ft  is  given  by 

oo 

££/3  =  e-Real<B>  Ar/>((l,r)(-l,r))  ,  0  <  0  <p  (3) 

T'=Z  — OO 

where  Ep(-)  is  obtained  via  the  inverse  Laplace  transform  of 
IFs-(-)  stated  below  at  t  =  /?, 

Fs  ((/i,  Wi)(Ja,  W2))  =  2  (S  +  1  +  jl  hW,  +  ji  I2W 2) 

(s  +  (/1  + /2)2^  ^5  +  jh  W\  +  J/2W2)  (s  +  1  4-  jltW? 

(s  +  l+iWj)-1. 

(4) 

The  {Ar}  are  Fourier-Series  coefficients,  computed  on  the  in¬ 
terval  [—/?,/?],  which  are  strictly  determined  by  the  relaxation 
oscillation  parameters  £r  and  ur.  Similar  expressions  are  ob¬ 
tained  in  general  for  higher  order  moments. 

The  resulting  upper  bounds  on  the  BER  are  determined 
by  the  various  system  parameters:  laser  linewidth-to-bit  rate 
ratio  Bi/R,  the  bit-energy-to-noise  density  ratio  Eb/N0,  the 
diversity  level  L  (or  equivalently  the  IF  bandwidth  expan¬ 
sion  relative  to  the  bit  rate),  Laser’s  relaxation  oscillation  nor¬ 
malized  decaying  time  and  Laser’s  relaxation  oscillation 
normalized  resonance  frequency  vr.  Orthogonal  reception  in 
the  absence  of  phase  noise  is  assumed  namely  j~  —  2ir  -8, 
where  AO  is  the  frequency  spacing,  R  is  the  bit  rate,  and  8 
is  a  positive  integer.  The  impact  of  Laser’s  relaxation  oscilla¬ 
tion  on  the  obtained  upper  bounds  is  studied  and  it  is  shown 
that  Laser’s  relaxation  oscillation  may  result  in  a  significant 
penalty  relative  to  the  simplified  Brownian  Motion  model.  For 
example,  assume  BER  =  10-9,  Bi/R  =  1,  8  =  4,  Z  =  25, 
vr  —  53.9,  £r  =  2.3.  Then  a  relative  penalty  of  nearly  9  dB 
is  predicted.  Increasing  the  IF  bandwidth  to  Z  =  30  would 
result  in  a  decreased  penalty  of  3.5  dB. 
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Abstract  — -  A  new  detection  scheme  is  proposed 
for  optical  pulse  position  modulation  (PPM)  commu¬ 
nication  system.  Channel  property  of  the  proposed 
scheme  is  clarified  theoretically. 

I.  Introduction 

This  paper  proposes  a  new  detection  scheme  for  detecting 
M  -  ary  optical  PPM  signal.  It  is  shown  that  the  proposed 
scheme  performs  almost  optimum  on  error  probability  crite¬ 
rion.  Channel  capacity  of  the  proposed  scheme  is  compared 
with  other  detection  schemes. 

II.  New  detection  scheme 

The  block  diagram  of  the  proposed  receiver  is  shown  in  Figure 
1.  The  receiver  consists  of  a  local  laser,  a  highly  transmissive 
beam  splitter,  a  photodiode  and  a  feedback  control  system  of 
the  local  laser.  Frequency  of  the  local  field  is  identical  to  the 
signal  held,  its  phase  is  ?r-shifted  with  respect  to  the  signal 
and  its  amplitude  is  set  so  that  its  reflected  part  is  the  same 
as  the  transmitted  part  of  the  signal.  Then,  if  the  local  laser  is 
on,  it  cancels  out  the  signal  held  perfectly  by  the  combination 
process  through  the  beam  splitter.  At  the  beginning  of  each 
symbol,  a  local  laser  is  on.  A  photon  number  of  combined 
held  is  counted  for  each  time-slot  individually.  If  no  photon 
is  counted  during  a  certain,  say  ’’ith”,  time-slot,  the  feedback 
control  system  switches  the  local  laser  off  from  the  next  time- 
slot.  If  no  other  photons  are  counted  after  that  till  the  end  of 
the  symbol,  a  symbol  having  an  optical  pulse  at  the  zth  time- 
slot,  m„  is  decided  as  the  transmitted  symbol.  On  the  other 
hand,  if  some  other  photons  are  counted  in  the  jth  time-slot 
( j  >  i),  a  symbol  mj  is  selected.  In  the  above  operation,  when 
a  symbol  m0  is  transmitted,  an  error  occurs  if  no  photon  is 
counted  in  a  certain,  say  ”zth”  (i  <  j)}  time-slot,  and  if  no 
photon  is  counted  in  the  jth  time-slot.  The  probability  of  this 
error  depends  on  a  symbol  as  follows: 

jPe(mj)  =  e-^s{l  —  (1  —  (1) 

Averaging  these  symbol-dependent  error  probabilities  with  re¬ 
spect  to  a  priori-probabilities,  we  obtain  average  error  prob¬ 
ability.  For  equally  probable  signal  (=1  /M),  an  average  error 
probability  is  given  as  follows: 

PeZT  =  ~  ^  "  e""8)*'-1 }  (2) 


III.  Numerical  results 

Error  performance  and  channel  capacity  ol  the  proposed 
scheme  are  shown  as  a  function  of  signal  energy  Ns  for  sym¬ 
bol  length  M  of  64  in  Figures  2  and  3,  respectively.  Those  of 
optimum-quantum  receiverfl]  and  direct  detection  receiver  are 
also  shown.  It  is  found  in  Figure  2  that  the  proposed  scheme 
is  superior  to  direct  detection  scheme  on  error  performance, 
and  it  performs  almost  optimally.  Fig. 3  also  shows  superiority 
of  the  proposed  scheme,  especially  for  Ns  over  6dB.  It  seems 
from  these  results  that  we  can  expect  the  proposed  detection 
scheme  to  perform  ultimately  low-energy  communication. 
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Figure  1:  Block  diagram  of  proposed  detection  scheme. 
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Figure  2:  Error  performance  for  symbol  length  M  —  64. 
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Figure  3:  Channel  capacity  for  symbol  length  M  =  64. 
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Differential  overlapping  pulse  position  modulation  (DO 
PPM)  can  achieve  higher  capacity  and  cutoff  rate  than  PPM, 
DPPM  and  OPPM  when  the  pulsewidth  and  the  average 
power  of  the  channel  are  constrained  [1].  In  [1]  erasure  events 
of  one  pulsed  chip  that  can  be  decoded  correctly  is  defined  as 
an  erasure  event.  This  results  in  loose  lower  bounds  on  the 
performance. 

This  summary  analyzes  the  tighter  lower  bounds  on  ca¬ 
pacity  and  cutoff  rate  of  DOPPM  in  optical  direct-detection 
channel.  Considering  what  pulsed  chips  we  have  to  detect  to 
decode  the  words  correctly,  we  derive  the  transition  probabil¬ 
ity  of  DOPPM  words  and  the  tighter  lower  bounds  in  optical 
direct-detection  channel. 

We  analyze  the  performance  of  DOPPM  under  the  window 
scheme  [1].  In  a  given  window  of  length  L  (chips),  we  attempt 
to  send  Wd  symbols  of  DOPPM  with  N  chips  consisting  of 
Q- ary  PPM:  L  <  WdQN.  We  specify  the  window  scheme  as 
follows:  only  if  we  detect  photons  at  the  both  ends  of  each 
pulse  for  all  the  pulses,  we  can  detect  any  sequence  fitting  in 
the  window  correctly.  In  particular  for  the  pulses  continuously 
generated  from  the  left  or  right  end  of  the  sequence,  we  can 
decide  the  positions  of  the  pulses  only  if  we  detect  photons  at 
the  left  or  right  end  chip  of  each  pulse,  respectively.  Unless 
we  detect  photons  at  the  chips  needed  for  correct  decoding, 
then  we  consider  the  entire  sequence  to  be  garbled  and  define 
this  sequence  as  an  erasure  sequence. 

We  denote  the  probability  of  using  any  one  of  the  M  sym¬ 
metric  inputs  by  P(xi)  =  a  and  that  of  not  sending  a  sequence 
by  P(x')  =  ft:  Ma  4-  /?=1.  The  mutual  information  can  be 
derived  as 

wd  wd-pLXi 

I(x;y)  =  Y,  Y  a-S(PL*i’PR*i) 

PLxi=0  PRx.=  0 

.  {.o,  (1 

where 

wd  wd~pL 

^  =  /?+Y  Y  a-S(Pi’PR)U-pc(pL,PR)]  (2) 

Pl=o  ph=o 

and  P z(Pl,  Pr)  and  S(Pl,  Pr )  are  the  correct  transition  prob¬ 
ability  and  the  number  of  symbols  having  Pl  and  Pr  pulses 
generated  continuously  from  left  and  right  ends  of  the  block, 
respectively. 


To  calculate  the  cutoff  rate  of  the  channel,  we  use  the  for¬ 
mula  [l]:  EX[J]  is  derived  as 


wd  wd-PLxi 

EX[J]  =/?2  +  Y  Y  «  •  S(PLxi ,  PRxi) 

pLxi~o  PRxi~o 
Wd  Wd-PLxj 

Y  Y  vsip^p***) 

LPLxj=  0  PKlj=  0 

Vu  -  pja^p^jhi  - 

+  a  ■  PciPL^PR*, )  +  (3) 


Figure  1  shows  the  optimal  capacity  per  slot  of  Q-ary  PPM, 
DPPM,  (Q,iV)  OPPM  and  DOPPM.  It  can  be  seen  that 
DOPPM  with  new  rule  can  achieve  higher  capacity  than  PPM, 
DPPM,  OPPM  and  DOPPM  with  conventional  rule  [1].  This 
is  because  some  erasure  events  in  [1]  that  can  be  decoded  cor¬ 
rectly  are  not  defined  as  an  erasure  event  in  this  paper  with 
new  rule.  Similar  trends  can  be  seen  in  the  cut  off  rate  per¬ 
formance. 


Fig.  1:  Optimal  capacity  per  slot  [nats/slot]  vs.  average  number  of 
photons  per  slot  s *  [photons/slot]. 
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Abstract  —  In  this  paper,  we  evaluate  the  perfor¬ 
mance  of  a  two-user  optical  communications  system 
over  a  noncoherent  optical  channel.  The  two  users 
are  separated  in  average  received  energy  and  are  di¬ 
rectly  interfering  with  each  other.  We  find  expres¬ 
sions  for  uncoded  bit  error  probability  and  codeword 
error  probability  for  a  particular  scheme. 

SUMMARY 

An  information  source  outputs  random  binary  data  in 
{0,1}  with  equal  probability.  During  a  time  interval  7\  the 
laser  of  the  j-th  transmitter,  j  =  1,  2,  is  amplitude  modulated 
by  the  data.  Both  transmitters  use  the  same  optical  channel 
to  communicate  with  a  central  receiver.  At  the  receiver,  the 
laser  light  is  detected  noncoherently  with  a  photo-detector  to 
count  the  photoelectrons. 

We  assume  that  photon  arrival  is  due  to  the  transmitting 
laser(s)  only.  The  photon  channel  is  a  Poisson  channel  where 
if  a  positive  average  real  number  x  is  transmitted,  the  prob¬ 
ability  that  the  integer  k  is  received  is  Poisson  distributed 
according  to 

P(k;x)  =  e~I 

Thus  the  discrete  channel  seen  by  a  transmitter  receiver  pair 
is  a  ^-channel.  Each  user  has  available  a  distinct  ^-channel. 

The  decoder  outputs  the  information  bits  based  on  a  max¬ 
imum  likelihood  estimate  of  the  transmitted  bits  given  the 
output  of  the  photo-detector.  Let  the  average  energy  avail¬ 
able  to  user  i  be  P»  (photons),  then  the  decoder  finds 

arg  max  Pr{m|6(l),  6(2)}, 
fa(l),6(2)  ' 

where  m  is  the  photon  count  at  the  output  of  the  photo  de¬ 
tector.  and  b(j)  is  the  bit  transmitted  by  user  j,  j  —  1,2.  The 
decision  regions  consist  of  (0,  L],  (L.  U],  and  (U,  oo),  where 


p2  —  Pi 

77  _ 

Ei 

lxt)j 

u  — 

_log(l  +  f0_ 

The  probability  of  bit  error  for  user  j  is  evaluated  for  an  un¬ 
coded  system  and  for  a  coded  system. 

A  close  look  at  the  error  rate  expressions  and  Fig.  1  will 
indicate  that  E\  and  P2  should  not  be  equal  or  close  together 
as  this  will  yield  maximum  interference.  On  the  other  hand 
if  E i  and  P2  are  too  far  apart,  the  user  with  high  energy 
will  suffer  from  large  variance,  since  for  the  Poisson  process 
the  mean  and  variance  are  equal  (to  the  average  energy).  For 
this  example,  Pi  +  E2  =  16  dB,  and  the  users  have  equal 
performance  at  E2  —  12.5  dB. 

Now  consider  the  use  of  a  terror  correcting  (n,k)  code. 
Each  user  pulses  between  the  two  Z-channels  by  alternating 
transmission.  In  this  case  the  average  transmitted  energy  E 
per  codeword  and  per  user  is  the  same,  and,  therefore,  both 
users  will  have  the  same  probability  of  error. 


Figure  1:  Uncoded  performance  for  Ei  +  =  16  dB. 

With  Pi  and  P2  as  defined  earlier,  the  codeword  error  prob¬ 
ability  Pw(Ei,  E2)  is  evaluated.  The  optimal  energy  levels  are 
given  by 

(PJ,PJ)  =  argminPu,(P1,P2), 

where  the  above  minimization  is  over  (Pi,P2)  with  the  fol¬ 
lowing  constraints 

Pi  >  0,  P2  >0,  Pi  4-  P2  =  P- 

It  is  not  obvious  to  what  values  of  energy  levels  result  in 
minimum  error  rate.  The  above  expression  is  evaluated  nu¬ 
merically  for  each  value  of  P.  Fig.  2  shows  typical  behaviour 
of  Pw  as  a  function  of  separation  between  energy  levels,  and 
for  a  particular  code  of  block  length  n  —  20  and  t  =  4. 


Figure  2:  Codeword  error  rate  as  a  function  of  separation 
between  energy  levels. 
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Abstract  —  We  analyze  the  bit-error  rate  (BER)  of 
an  optical  communication  system  using  the  superflu- 
orescent  fiber  source  (SFS).  The  counting  statistics  of 
thermal  light  give  improved  performance  relative  to 
the  Gaussian  statistics  that  predict  a  BER  floor. 


Summary 

Consider  a  spectrum-sliced  wavelength-division  multiple- 
access  (WDMA)  system  that  employs  the  SFS  [l].  In  the  SFS, 
spontaneous  atomic  emission  is  amplified  through  a  rare-earth 
doped,  single  mode,  optical  fiber  end-pumped  by  an  exter¬ 
nal  laser.  The  incoherent,  broad  bandwidth  output  is  best 
modeled  as  thermal  light.  The  output  is  spectrum-sliced  then 
on-off  modulated  by  a  binary  symbol  stream,  resulting  in  an 
intensity  modulation  waveform  arriving  at  the  photodetector. 
Neglecting  the  dark  current  and  thermal  noise,  we  obtain  the 
BER  as  the  Laplace  transform  of  the  integrated  intensity  W: 


BER  = 


e  aewpw(w)dwi 


a) 


where  pw{w)  models  the  stochastic  fluctuation  of  the  light, 
a  =  'qjhv  (//  is  the  quantum  efficiency  and  hv  the  photon 
energy).  Using  the  negative  binomial  statistics  for  the  photo- 
electron  count  [2],  the  BER  and  the  signal/noise  ratio  are: 


BER 

SNR 


M 


Pc/wP  +  .5(1  -F  T2)  * 


(2) 

(3) 


The  mode  number,  M,  is  the  ratio  of  the  symbol  duration 
T  to  the  coherence  interval  Tc  of  the  incident  light.  That 
is,  M  —  B0/2Be>  where  B0  =  1  /Tc  and  Be  =  1/2T  are  the 
optical  and  detection  bandwidths,  respectively.  P  =  E[W]/T 

is  the  received  power,  Pc  =  hvjTc  is  the  coherence  power 
of  the  photon  and  V  the  degree  of  polarization.  The  limiting 
SNR  is  B0/(l  +  P2)Be  for  high  received  power  ( 7)P/PC  »  1). 

The  BER  approaches  the  shot-noise  limited  value  of 
5e~a£[w]  count  degeneracy  parameter,  r\PjPc ,  is  much 

smaller  than  unity.  Since  the  BER  decreases  monotonously 
with  Tc,  a  Lorentzian  spectral  shape  can  have  a  lower  BER 
at  a  higher  symbol  rate  compared  to  a  Gaussian  shape  with 
the  same  power  and  3dB  linewidth.  The  ideal  rectangular 
linewidth  has  the  worst  performance.  This  must  be  consid¬ 
ered  against  the  channel  crosstalk  since,  not  surprisingly,  the 
tail  of  a  Lorentzian  has  the  slowest  spectral  decay.  Without 
the  linewidth  constraint,  we  obtain  a  rather  interesting  theo¬ 
retical  result  that  the  shot-noise  performance  is  achieved  with 
a  spectral  shape  of  infinite  linewidth  and  zero  spectral  height. 
In  comparision,  this  is  also  achieved  with  an  ideal,  coherent 
laser  of  zero  linewidth  and  infinite  spectral  height. 


1  This  work  was  supported  by  the  Advanced  Technology  Program 
of  the  Texas  Higher  Education  Coordinating  Board,  GTE,  Inc.,  the 
U.S.A.F.  Phillips  Laboratory  and  its  Palace  Knight  program. 


In  spectrum-sliced  WDMA,  the  maximum  SNR  is  inversely 
proportional  to  the  number  of  channels.  Equation  (2)  demon¬ 
strates  that  the  BER  is  not  determined  solely  by  the  SNR 
which  reaches  a  limiting  value;  increasing  the  spectral  inten¬ 
sity  of  the  light  reduces  the  BER.  In  fact,  the  number  of  chan¬ 
nels  can  be  increased  by  increasing  the  received  power,  while 
maintaining  a  desired  BER  for  a  given  symbol  rate.  This  has 
important  implications  for  spectral  amplitude  encoded  CDMA 
systems  that  require  a  large  number  of  spectral  chips  [3].  In¬ 
voking  the  Gaussian  assumption  [1]  would  lead  to  incorrect 
conclusions.  For  example,  the  Gaussian  predicts  a  BER  floor 
due  to  the  limiting  SNR  and  therefore  expects  that  it  would 
be  impossible  to  increase  the  number  of  channels  by  increasing 
the  power,  once  the  limiting  SNR  has  been  reached. 

One  can  show  that  the  BER  is  lower  bounded  by  using  the 
fact  that  t~aw  is  convex  over  [0,  00)  and  applying  Jensen’s  in¬ 
equality  to  Eq.  1  to  obtain  BER  =  ,SE[e~aW]  >  .5e~aE^w^. 
Accordingly,  a  light  source  that  achieves  this  can  be  considered 
ideal,  i.e.  its  intensity  is  deterministic:  pw{w)  —  8(w  —  E[W]), 
as  would  be  expected.  Figure  1  shows  calculated  BER  values 
for  the  ideal  and  spectrum-sliced  fiber  sources,  and  the  Gaus¬ 
sian  assumption.  The  Gaussian  predicts  a  BER  floor  and  un¬ 
derestimates  the  true  performance. 


Fig.  1:  BER  comparisions  at  1  Gbps.  The  spectrum-sliced  SFS  is 
polarized  (V  =  1);  M  =  36,  77  =  50%  at  1550 nm  wavelength. 
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Hybrid  fiber/coax  (HFC)  is  emerging  as  an  inexpensive 
architecture  for  providing  broadband  services  to  residences.  It 
has  optical  fibers  extending  from  the  central  office  or  head- 
end  to  remote  fiber  nodes.  Extending  from  the  fiber  nodes  to 
the  residences  is  a  coaxial  cable  distribution  bus. 
Multiplexing  allows  100  to  500  users  to  share  the  bandwidth 
of  each  coax  distribution  bus.[l]  This  architecture 
advantageously  combines  the  long  range  of  optical  fiber  with 
the  high  bandwidth  and  simple  electrical  interfaces  of  coaxial 
cable.  HFC  will  initially  provide  telephony  and  cable  TV,  but 
it  also  has  sufficient  bandwidth  for  future  interactive  and 
multimedia  services.  Many  regional  telephone  companies, 
and  most  cable  TV  companies,  in  the  U.S.  have  committed  to 
HFC.  This  architecture  will  be  widely  deployed  well  into  the 
future,  while  the  demand  for  residential  bandwidth  will 
increase.  To  meet  this  demand,  there  will  be  an  increasing 
need  for  multi-user  information  theory  and  communication 
techniques  to  maximize  the  bandwidth  and  fully  exploit  the 
potential  of  this  unique  medium.  This  paper  initializes 
exploration  into  maximizing  the  channel  capacity  of  HFC. 

Capacity  calculations  can  safely  ignore  the  optical  link  and 
instead  focus  on  the  coax  distribution  bus.  The  coax  bus  has  a 
physical  tree-and-branch  architecture,  but  it  is  logically  a 
shared  bandwidth  bus.  Receivers  are  spatially  distributed 
along  the  bus,  with  propagation  distance  /,■  between  the  fiber 
node  and  receiver  i  .  The  magnitude  response  of  a  receiver  i  is 

\Hi\=rli^  where  Tis  a  constant  and /is  the  frequency.  The 
transmitted  signal  is  s(t),  which  is  attenuated  by  Hi  and 
delayed  by  /,/v,  where  v  is  the  propagation  velocity.  The 
signal  received  by  the  zth  receiver,  on  carrier  frequency  /,  is 

ri(t)=Tli^ s(t-li/v).  The  Fourier  transform  of  r,*(f)  is 

Ri(f)=rl'^e~j2’Tfli/'’  S(/),  where  y=V-l .  Define 

D=re~j2n'filv .  Then  /?,•(/)  =  D,f^  S(f).  This  response 
is  fundamental  to  capacity  calculations. 

Gaussian  noise  is  assumed.  For  a  single  user,  the  capacity  is 
easily  solved  with  the  classic  "water-filling"  spectral  density. 
However,  there  are  multiple  users,  and  the  capacity  is  a  multi¬ 
dimensional  function  with  dimensionality  equal  to  the 
number  of  users.  Communication  from  the  fiber  node  to  the 
users  is  similar  to  the  classic  Gaussian  broadcast  channel,  and 
from  the  users  to  the  fiber  node  is  similar  to  the  classic 
Gaussian  multiple  access  channel.  Classic  multi-user 
information  theory  assumes  that  the  channel  response  is  the 
same  to  each  user.  However,  users  on  the  coax  bus  are  located 
at  different  distances  from  the  fiber  node,  so  although  they 
each  see  the  same  superposition  of  signals,  they  each  have  a 
different  channel  response.  Unlike  classic  multi-user  capacity 
calculations,  here  the  ensemble  of  propagation  distances  is  a 
fundamental  new  variable.  This  problem  is  unsolved  in  its 
general  form. 

Assumptions  are  made  to  get  the  results  here.  Only  two  types 
of  communications  are  considered:  information  that  is 
broadcast  to  all  users,  and  information  that  is  specific  to  only 


a  single  user.  Each  user's  specific  information  has  the  same 
bit-rate,  and  is  carried  in  a  single  distinct  interval  on  the 
frequency  axis.  The  capacity  of  the  user  specific  information 
transmitted  to  each  user  is  calculated  here.  Closer  users  are 
assigned  higher  frequencies  than  more  distant  users  are. 

Two  types  of  coax  bus  architectures  are  examined:  a  cable  TV 
type  of  coax  network  with  analog  amplifiers  and  attenuating 
taps,  and  a  passive  coax  network  with  ideal  band-pass  filter 
taps.  A  typical  suburban  tree-and-branch  coax  distribution 
bus  [2]  is  modeled  with  250  feet  of  0.625  inch  coax  between 
the  splitters  and  four-way  taps.  Users  connect  to  the  taps  with 
250  feet  of  coax  drop.  Each  user's  channel  response  is 
calculated,  and  with  typical  cable  transmission  parameters 
SNRs  (about  40  dB)  are  found  as  a  function  of  frequency. 
Downstream  digital  signals  are  restricted  to  the  450  to  1000 
MHz  band.  Using  Shannon  theory,  differential  entropies  and 
capacities  are  found,  then  frequency  assignments  are 
numerically  calculated  to  maximize  the  capacity,  which  is 
shown  in  Fig.l.  The  sum  total  capacity  is  multiple  Gbps. 


Number  of  users 


Fig.l.  Shannon  capacity  dedicated  downstream  to  each  user. 

There  are  simplifying  assumptions  here,  and  the  multi-user 
channel  capacity  of  the  shared  coax  bus  is  in  general  an 
unsolved  problem.  The  shared  upstream  channel  from  5  to  42 
MHz  has  much  radio  ingress,  causing  many  noise  spikes  in 
the  frequency  domain,  and  making  the  upstream  capacity 
calculation  more  difficult. 
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Abstract —  A  direct  and  efficient  method  for  evaluation  of 
the  error  probability  of  optical  heterodyne  receivers  in  the 
presence  of  phase  noise  is  presented. 

Closed  form  expressions  for  the  statistics  of  the  decision 
variable,  including  photodetector  shot  noise  and  thermal 
noise  from  electronic  circuitry,  are  shown. 


I.  Introduction 

The  decision  variable,  in  complex  signal  notation,  of  a 
heterodyne  optical  system  with  an  envelope  detector  re¬ 
ceiver  has  the  form 

\v\2  =  \y  +  x\2  (1) 

where  y  represents  phase  noise  and  X  additive  noise.  The 
phase  noise  is  produced  by  the  transmitting  and  local  os¬ 
cillator  lasers.  The  additive  noise  X  is  photodetector  shot 
noise  and  thermal  noise  from  the  electric  circuitry. 

Further  background  and  derivation  of  the  formulas  can 
be  found  in  a  forthcoming  paper  by  Einarsson  et  al  [1]. 

II.  Amplitude-Shift  Keying 

To  simplify  the  analysis  let  the  prehlter  be  a  bandpass 
integrator  operating  at  the  heterodyne  frequency.  During 
the  data  symbol  interval  the  prefilter  output  is  sampled  L 
times  at  i  =  kTl ,  k  =  1,  2,  — ,  L}  generating  a  sequence  of 
complex  valued  stochastic  variables 


Vk  =  ri.34  +  (2) 


where 

yu  =  j 7  r  eje(t)dt  (3) 

1  J(k-1)T‘ 

is  filtered  phase  noise  and  Xk:  filtered  white  noise,  is  a 
complex  valued,  zero  mean  Gaussian  variable. 

The  phase  noise  is  a  continuous  Brownian  motion  (Wiener- 
Levy)  process  with  Gaussian  statistics.  The  primary  sta¬ 
tistical  properties  of  9(t)  are  easily  specified  but  the  prob¬ 
ability  distribution  of  34  is  difficult  to  determine.  Foschini 
and  Vannuci  [2]  obtained  a  closed  form  approximate  result 
by  expanding  the  integrand  e30^  in  (3)  in  a  Taylor  series 
and  keeping  the  first  terms, 

The  decision  variable  U  is  the  sum  of  L  —  T/Tf  equally 
distributed  independent  variables  \Vk\2  and  the  approxi¬ 
mate  moment-generating  function  (mgf)  of  U  is 


Vu{s) 


1 

(WF 


exp 


sinchi 


2/3mis 


(1  -*)L2 


-Lf  2 

(4) 
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where  ”sinch”  denotes  the  hyperbolic  sinc-function. 

The  parameter  /?  =  2i rB^T  is  equal  to  2tt  times  the 
product  of  the  data  symbol  interval  T  and  Bl)  the  sum 
of  the  3-dB  linewidths  of  the  lasers  at  the  transmitter  and 
the  local  oscillator.  The  quantity  my  —  A2T/2  —  A2LT! /2 
is  the  expected  number  of  photoelectrons  in  the  received 
optical  pulse. 

III.  Frequency-Shift  Keying 

Frequency-Shift  Keying  (FSK)  is  readily  analyzed  utiliz¬ 
ing  the  results  from  ASK  since  an  FSK  receiver  contains 
two  branches,  each  identical  to  an  ASK  receiver. 

IV.  Differential  Phase-Shift  Keying 

In  Differential  Phase-Shift  Keying  (DPSK)  the  phase  of 
the  transmitted  optical  field  is  modulated  and  the  phase 
of  the  previous  signal  is  used  as  a  phase  reference  in  the 
receiver. 

We  consider  the  case  without  predetector  filtering  where 
T*  —  T  and  one  sample  per  signal  interval  is  generated. 

An  approximate  mgf  of  the  decision  variable  U  is 

exp  fe) 

«tr(»)  =  V  U  (5) 

y/i  -  (2ro/3/3  +  1  )2s2v/Ur^  v 

V.  Error  Probability 

The  moment-generating  function  determines  the  statis¬ 
tical  distribution  of  the  decision  variable.  The  transmis¬ 
sion  error  probability  is  easy  to  calculate  from  ^c/(s)  using 
the  saddlepoint  approximation  suggested  by  Helstrom  [3]. 
The  optimal  value  of  the  prefilter  bandwidth  parameter  L 
is  readily  determined  by  this  procedure. 

The  theory  presented  applies  also  to  receivers  with  an 
optical  preamplifier  in  the  presence  of  phase  noise.  We 
refer  to  the  text  by  Einarsson  [4]  for  a  discussion. 
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Abstract  —  Some  ideas,  methods,  and  results  of  cod¬ 
ing  and  design  theory,  especially  a  duality  in  bounding 
the  optimal  size  of  codes  and  designs  (orthogonal  ar¬ 
rays),  are  used  to  solve  a  new  problem  connected  with 
randomized  systems  of  functions. 

I.  Introduction 

A  system  of  functions  in  n  variables  is  called  randomized  if 
the  functions  preserve  the  property  of  their  variables  to  be  in¬ 
dependent  and  uniformly  distributed  random  variables.  Such 
a  system  is  called  /-resilient  if  for  any  substitution  of  con¬ 
stants  for  any  i  variables,  0  <  i  <  /,  the  derived  system 
of  functions  in  n  —  i  variables  is  also  randomized.  A  sys¬ 
tem  of  N  Boolean  functions  in  n  variables  of  which  any  T 
form  a  /-resilient  system  is  referred  to  as  a  (n,  /,  V,  T)-system 
(1  <  T  <  N,  0  <  t  <  n).  We  investigate  the  problem  of  find¬ 
ing  the  maximum  number  N  =  N(n,t,T)  of  functions  in  a 
(n,  /,  N,  T)-system.  This  problem  is  reduced  to  the  minimiza¬ 
tion  of  the  size  of  certain  combinatorial  designs,  which  we  call 
split  orthogonal  arrays  (SOA).  A  binary  code  C  of  length  n+N 
is  called  (n, /,  JV,  T)- SOA  if  for  any  choice  of  binary  word  of 
length  t  +  T  and  any  choice  of  t  4-  T  places  of  which  t  belong 
to  the  first  n  places  and  T  belong  to  the  last  N  places  there 
are  exactly  |C|2'£“T  code  words  which  contain  this  word  in 
these  places.  Let  B(n,  t  +  1,  N,  T  +  1)  be  the  minimum  size  of 
a  code  C  which  is  (n,  Z,  N,  T)-SOA. 

II.  Linear  programming  bounds 

A  (n,Z,  Ar,T)-system  exists  if  and  only  if  there  exists  a  sys¬ 
tematic  (n,  /,  IV,  T)-SOA  with  the  first  n  information  symbols. 
This  gives  the  following  necessary  condition  for  existence  of  a 
(n,  Z,  N,  T)- system: 

2n  >  jB(n,t  +  l,JV,T  +  l). 

We  extend  the  linear  programming  method  of  Delsarte  [1]  to 
obtain  a  lower  bound  on  B(n,t  +  1  ,N,T  +  1)  and  an  up¬ 
per  bound  on  JV(n,/,T).  Let  A(n,d)  ( B(n,d ))  be  the  max¬ 
imum  (minimum)  size  of  a  code  of  length  n  with  the  min¬ 
imal  distance  (respectively,  with  the  dual  distance)  d.  Let 

K%(z)  —  1)J'  0)  (fc-i)  be  the  Krawtctl0uk  polyno¬ 

mial  of  degree  k  and  suppose  that  for  an  arbitrary  polyno¬ 
mial  f(z)  =  £*=0  **?(*)>  ft(/)  =  /(0)/  /o.  If  A'(n,d)  = 
minCt(f),  where  the  minimum  is  taken  over  all  polynomi¬ 
als  f(z)  such  that  fo  >  0 , /;  >0  for  i  —  l,2,...,n,  and 
/( 0)  >  0,/(t)  <  0  for  i  =  d,...,n;  and  B*(n,d)  =  maxf 2(/), 
where  the  maximum  is  taken  over  all  polynomials  f(z)  such 
that  f0  >  0,  fi  <  0  for  i  =  d, . . . ,  to,  and  /( 0)  >  0,  f(i)  >  0  for 
i  =  1,  2, . . . ,  n,  then  by  the  Delsarte  inequalities, 

A(n,d)  <  A*(n,d),  B(n,d)  >  B*(n,d). 

Delsarte  [1]  found  an  f(z)  which  gives  the  Rao  bound 
B*(n,d )  >  R(n,  d), 

1Tliis  work  was  partially  supported  by  RFBR  under  grant  95- 
011-03  and  by  ISF  under  grant  MEF300. 


where  R{n, 21  + 1  +  a)  =  2"  £'i=o  ("/)  when  a  6  {0,1}.  The 
author  [2]  found  polynomials  which  imply 

A*(n,d)  < 


r.  /  Lt(d)  ifdfc(n-l)<d-l<dfc-i(«-2) 

2L*~1(d)  if  dk(n  -  2)  <  d  -  1  <  dk(n  -  1), 

where  dk{ to)  is  the  smallest  root  of  K (z)  and 


Using  the  linear  programming  method  for  bounding 
B(n,t  +  1,JV,T  +  1)  and  the  important  relationship 
A*(n,d)B*(n,  d)  =  2n  proved  in  [3],  we  obtain 

Theorem  1.  If  there  exists  a  (to,  Z,  IV,  T) -system,  then 
2 71  >  R(n,t+l)R(N,T+l),  L(to,Z+1)  >  i*(JV,T  +  l),  and 

2 nL{N,  T  - h  1)  >  2 NR(n,  t  +  1). 


III.  Sufficient  condition 

Let  Z(n,  d)  be  the  minimum  number  of  information  symbols  in 
a  systematic  binary  code  of  length  n  with  the  dual  distance 
d.  In  [4]  it  was  shown  that  the  condition  T  <  to  —  /(to,  t  -f  1) 
is  sufficient  for  the  existence  of  a  (n,  Z,  T,  T)-system.  In  the 
general  case  we  have 

Theorem  2.  If  /(to,  t  -f-  1)  +  /(JV,  T  +  1)  <  to,  then  there 
exists  a  (to,  Z,  AT,  T)-system. 

Theorems  1  and  2  give  rise  to  good  asymptotic  bounds  on 
N(n,t,T)  and  imply  complete  results  in  some  cases. 

Examples.  For  any  h= 2,3,...  , 

JV(n,3,n/2-l)=n  if  n  =  2h+1, 

N(n,  5,  (n  -  y/n)/2  ~  =  n  if  n  =  22h , 

N{n,  3,5)  =  s/z^Jn  if  n=24'1-1. 

Indeed,  the  existence  of  the  Hamming,  Kerdock  and  Preparata 
codes  implies  that  /(to,  4)  —  log  2to,  /(to,  to/ 2)  =  n  —  log  2n 
when  n  =  2*1"*"1,  and  Z(n,6)  =  2logn,  /(n,  (n  —  y/n)/2)  = 
n  —  2  log  n  when  n-  22h .  This  gives  the  corresponding  lower 
bounds  by  Theorem  2.  On  the  other  hand,  R(n,  4)  =  2n, 
L(n,  n/2)  =  2n,  R(n,  6)  =  n2  -  n  +  2,  L(n,  (n  -  v^)/2)  = 
n2  +  —  l(n  —  2)/2  and  the  same  upper  bounds  follow  from 

inequalities  of  Theorem  1. 
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Abstract  —  McEliece  proposed  a  public-key  cryp¬ 
tosystem  based  on  binary  linear  codes,  in  particular 
binary  classical  Goppa  codes.  In  this  talk  we  will  look 
at  various  aspects  of  McEliece’s  scheme  in  the  gen¬ 
eral  setting  of  5-ary  codes.  In  particular,  we  consider 
schemes  based  on  much  larger  class  of  5-ary  algebraic- 
geometric  (AG)  Goppa  codes,  subfield  subcodes  of 
AG  codes,  and  concatenated  codes.  We  will  give 
explicit  constructions  of  several  schemes  which  have 
very  high  work  factor,  excellent  key-length/plain-text 
ratios,  and  relatively  smaller  size  of  the  keys  for  given 
work  factors.  We  will  also  present  its  modifications 
and  generalizations  following  Krouk  and  others.  Fi¬ 
nally,  we  will  discuss  some  open  problems. 

I.  Introduction 

In  1978,  McEliece  [2]  introduced  a  public  key  cryptosystem 
(PKS)  based  on  binary  linear  codes  and  suggested  the  imple¬ 
mentation  of  his  scheme  by  randomly  selecting  the  generator 
matrix  of  a  [1024,  524, 101]  Goppa  code  and  suitably  modify¬ 
ing  it.  The  security  of  this  scheme  is  based  on  the  well  known 
NP-completeness  of  the  decoding  problem  for  general  linear 
codes  and  the  fact  that  there  are  a  huge  number  of  inequiva¬ 
lent  Goppa  codes  with  the  given  parameters. 

For  practical  applications  that  need  flexibility  and  complex¬ 
ity  (e-g->  computer  communication),  we  will  look  at  various  as¬ 
pects  of  McEliece’s  scheme  using  the  newer  and  much  larger 
class  of  5-ary  AG  Goppa  codes.  We  will  also  present  mod¬ 
ifications  and  generalizations  of  this  scheme  using  the  ideas 
of  Krouk  and  others.  Furthermore ,  we  also  make  some  ob¬ 
servations  on  the  cryptanalytic  attacks.  We  first  discuss  the 
McEliece  PKS  in  the  general  setup  applicable  to  5-ary  codes. 
We  show  by  analysis,  by  examples,  and  by  heuristics  that  the 
complexity  of  breaking  this  scheme  under  one  widely  discussed 
attack  is  greater  than  previously  believed. 

II.  The  Proposed  Schemes 

Our  constructions,  modifications,  and  generalizations  are 
based  on  the  following  coding  schemes: 

(A)  AG  codes  defined  over  a  finite  field  with  5  elements  (im¬ 
mensely  many  choices  are  attained  by  varying  various 
parameters  of  the  corresponding  curves); 

(B)  Subfield  subcodes  of  5-ary  codes  in  (A).  This  includes 
binary  AG  codes,  some  of  which  perform  better  than 
binary  Goppa  codes. 

(C)  Concatenation  of  5m-ary  AG  code  with  good  5-ary 
codes. 

In  each  of  the  cases  (A)-(C),  we  give  explicit  construc¬ 
tions  of  schemes  where  the  work  factor  is  quite  substantial. 
They  have  excellent  key-length/plain- text  ratios  and  relatively 
smaller  size  of  the  key  for  the  same  work  factor.  The  decrypt¬ 
ing  complexity  from  schemes  based  on  plane  curves,  especially 
from  maximal  curves,  is  0(n3)  or  better.  4 


O.  Moreno 
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III.  On  the  attack  of  Sidelnikov  and  Shestakov 
Sidelnikov  and  Shestakov  (S-K)  [4]  have  shown  that  the 
Niederreiter  PKS  [3]  scheme  (known  now  to  be  equivalent  to 
the  McEliece  PKS  scheme)  is  insecure  for  the  particular  case 
of  generalized  Reed-Solomon  codes. 

The  attack  of  S-K  depends  fundamentally  on  the  Vander¬ 
monde  structure  of  the  generalized  RS  codes  (and  also  on  their 
MDS  property),  and  is  not  applicable  to  systems  based  on 
other  types  of  codes.  For  various  considerations,  our  schemes 
are  excellent  alternatives. 

IV.  Implementing  Krouk  and  Gabidulin 
Modifications 

Krouk  [1]  strengthens  the  McEliece  scheme  by  trying  to 
remove  symmetry  from  the  coding  scheme.  We  make  some 
observations  on  his  modification  and  show  that  AG  codes  are 
particularly  suitable  for  it. 

We  will  also  present  improvements,  modifications,  and  im¬ 
plementations  of  some  recent  PKS  schemes  of  Gabidulin. 

V.  Further  Improvements  and  Open  Problems 
We  show  that  many  more  constructions  of  PKS  using  AG 
codes  are  possible  if  certain  curves  that  are  maximal  or  near 
maximal  exist. 

Some  of  the  results  sumarize  in  this  article  will  appear  in 
details  in  Design  Codes  and  Cryptography. 
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I.  Introduction 

This  paper  examines  the  growth  of  the  degrees  of  binary 
trinomials  that  are  divisible  by  a  fixed  binary  primitive  poly¬ 
nomial  f(x)  of  degree  n.  Our  goal  is  to  find  a  heuristic 
distribution  that  depends  only  on  n.  Our  motivation  stems 
from  some  suggested  correlation  attacks  on  certain  stream  ci¬ 
phers  [1,  2,  3].  These  attacks  use  binary  relations— binary 
polynomials — as  parity  checks  in  order  to  recover  information 
about  the  cipher  key.  Low  weight  relations  perform  best  but 
require  more  sequence  because  of  their  large  degrees. 

II.  Binary  Trinomials 

Let  a  be  a  primitive  element  of  GF(2n)  with  minimum 
polynomial  /(x).  We  consider  the  set  of  3-term  or  trinomial 
relations  <yb  +  ota  +  1  =  0,6  >  a  >  0.  That  is,  f(x)  divides 
+  If  6  e  {1, ... ,  2n  — 2}  =  X,  then  a6+l  is  some  power 
of  ot  with  exponent  also  in  X.  Thus  the  trinomials  partition  X 
into  pairs.  denote  the  set  of  all  trinomials  as  an  ordered 
listing  of  ordered  pairs: 

{(&,,<!,)  |  bi  >  a;;  bi  increasing;  i  =  -  l}  (1) 

We  call  such  an  ordered  listing  of  pairs  a  pattern  of  trinomials. 
As  an  example,  the  two  trinomial  patterns  for  n  =  3  are 


IV.  Approximate  Distributions 
As  N  gets  large,  Ri(k)  ^  £ e~k2/N .  This  approximation 
yields  an  approximation  for  the  mean:  \J N tt/2.  The  general 
distribution  can  be  approximated  similarly: 


31  54  62  and  32  51  6  4. 

Consider  all  partitions  of  X  that  are  in  the  canonical  form 
of  (1);  we  call  such  partitions  patterns.  We  take  all  patterns  to 
be  equally  likely,  and  we  model  choosing  a  random  trinomial 
pattern,  i.e.,  a  primitive  polynomial,  as  choosing  a  random 
pattern.  We  model  the  distribution  of  trinomial  degrees  by 
the  distribution  of  the  size  of  6,  over  all  patterns: 

Rt(k)  =  Prob(6j  =  k) 

In  particular,  R\{k)  models  the  distribution  of  the  lowest  de¬ 
gree  trinomial.  We  derive  the  distributions  by  considering  pat¬ 
terns  as  in  (1)  defined  for  a  general  index  set  X  =  {1, . . . ,  N  = 
2 M}.  The  distribution  Ri(k)  is  only  nonzero  for  k  between  2 i 
and  M  +  i. 


III.  Formula  for  Ri(k ) 

The  calculation  of  the  Ri(k)  is  combinatorial  and  follows 
from  calculating  the  total  number  of  patterns  and  those  pat¬ 
terns  with  bi  =  k.  (We  also  have  an  alternative  derivation  as 
a  classical  “birthday  problem”  in  probability.) 

Proposition  1  For  *  =  Af  and  k  =  2*, . . . ,  M  +  i, 


Ri(k)  =  2 


fc-2t  +  l 


(*  -  1)1  (N  -k)  1  M\ 

(k  -  2*)!(*  “  l)1*  (M-k  +  i)\ 


In  Figure  1,  we  plot  in  ascending  order  (the  V  curve)  the 
degrees  of  the  trinomials  divisible  by  f(x)  =  x16  +  xb  +  x  + 
x2  +  l.  Since  the  x-coordinates  correspond  to  the  i  index  in 
Ri(k),  we  also  plot  for  a  given  i  the  mean  of  the  Rt(k)  distri¬ 
bution  and  ±2  standard  deviations  from  the  mean.  Though 
the  actual  trinomial  curve  levels  off  sooner  than  the  model, 
the  model  captures  the  essence  of  the  growth  in  the  degrees. 

^he  authors  were  supported  by  the  MITRE  Sponsored  Re¬ 
search  program. 


R>(k)SS 


fc2-1 

2<— 1  iV*(t  —  1)! 


e~k2/N 


If  we  replace  k  with  a  continuous  variable,  then  the  distribu¬ 
tion  is  in  fact  a  generalized  Rayleigh  distribution  with  param¬ 
eters  2 i  and  y/N.  In  particular,  a  good  approximation  to  the 
mean  of  Ri(k)  is 


1  •  3  •  ■ » (2t  —  1) 


y/N*/ 2 


Such  formulas  present  an  easy  way  to  generate  the  model 
curves  as  in  Figure  1  and  offer  an  analytic  method  to  mea¬ 
sure  the  growth  of  the  degrees  of  the  binary  trinomials. 
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Abstract  —  This  paper  analyzes  risks  and  presents 
the  requirements  of  digital  multisignature  scheme  in 
electronic  contract  systems.  A  new  digital  niultisigna- 
ture  scheme  suitable  for  contract  systems  is  proposed 
and  the  efficiency  of  the  scheme  is  discussed. 

I.  Introduction 

The  electronic  contract  system  needs  to  replace  hand  writ¬ 
ten  signatures  with  digital  signatures,  digital  multisignature 
might  also  be  needed  in  such  environments  where  several  per¬ 
sons  must  sign  the  same  digital  message. 

dli ere  are  the  following  potential  risks:  signature  forgery, 
contract  with  the  unauthorized  party,  denial  of  contract,  mis¬ 
use  of  contractor's  signature,  malicious  contract  destruction. 
It  is  desired  that  the  digital  multisignature  scheme  satisfy 
the  following  requirements  in  the  electronic  contract  system: 
verifiability,  viability,  dishonesty  -  detectability,  commonness 
(common  procedures),  generality,  orderlessness. 

This  paper  assumes  that  m  users  join  the  electronic  con¬ 
tract  system  and  sign  the  same  contract  message,  and  all  sign¬ 
ers  are  connected  by  a  bridge  node  or  MCU.  Let  M  be  the 
contract  message  to  be  signed.  /  and  h  denote  public  one-way 
functions  which  are  easily  computable  and  are  hard  to  invert. 
Let  J Di  and  IDCm  denote  the  identification  information  of 
user  (contractor)  i  and  the  concatenation  of  signers’  IDs,  i.e 

IDem  =  IDl  ||  ID2  ||  ••■  ||  I Dm. 

II.  Key  generation  and  publication 

Signer  i  registers  his  identification  information  (IDi)  and 
the  trusted  center  issues  a  smart  card  as  follows  : 

1.  1  lie  trusted  center  selects  two  large  prime  numbers  p 
and  q,  and  keeps  them  secret. 

-•  ^ie  trusted  center  publishes  a  modulus  N  which  is  the 
product  of  p  and  q . 

3.  The  trusted  center  calculates  integers  StJ  for  signer  *  : 

Uj  =  f(IDl,j))  j=  (1) 

Iijl  -  sh  (inod  N)  (2) 

4.  The  center  issues  a  smart  card  to  signer  i  after  identi¬ 
fying  his  physical  identity. 

The  smart  card  includes  the  set  of  (N,  /,  h,  St i,  •  *  • ,  Slfc). 

III.  Multisignature  generation 

1.  The  signer  n  generates  a  random  integer  Rn  £  ZN  and 
calculates 

Xn  =  E?nXn-i  (mod  N)  (3) 

(e„ i  ,  •  *  ■ ,  enk)  =  h(MJDCm,  Xn)  (4) 

Yn  =  Yn-lRn  Snj  (mod  N)( 5) 

e«j=3 

where  X0  =  1,  Y0  =  1  and  j  =  1,  •  •  ■ ,  k 

2.  The  signer  n  broadcasts  (XniYn)  to  all  the  other  sign¬ 
ers. 


IV.  Multisignature  verification 

When  the  multisignature  generation  procedures  were 
completed,  the  verifier  or  signer  gets  the  multisignature 
(M,IDcm,Xx,  -  *  •  ,Xm,Ym),  the  verifier  calculates 

(e*i  j  '  *  *  >  e*fc)  =  h{M,  I  Derm  -Y*)>  *=l,***,m  (6) 

and  stores  only  (M,  IDcrn ,  (cu ,  •  •  • ,  eu.),  •  •  • ,  (eml ,  •  • . ,  cmfc), 
Ym)  for  verification  of  the  multisignature.  When  multisigna¬ 
ture  verification  is  requested,  the  verification  procedures  are 
as  follows: 

1.  The  verifier  calculates  U3  with  IDCm- 

Iij  =  f(IDt,j),  i  =  1, . .  • ,  m,  j  =  1,  • . . ,  k  (7) 

2.  The  verifier  calculates  Zm, 

m 

■^rn  =  Vm  If  11  (mod  N),  j-  (8) 

*=  1  «u  =  1 

3.  The  verifier  calculates  h(M,IDcm,Zm)  and  checks 
whether  the  equation 

(^ml)  •  •  * ,  em^)  =  h(M ,  I DCm,  Zm)  (9) 

holds  true. 

If  it  does,  the  multisignature  message  is  considered  to 
be  valid. 

V.  Efficiency  and  Conclusions 

The  proposed  digital  multisignature  scheme  satisfies  with 
all  the  requirements  of  multisignature  scheme  in  electronic 
contract  systems.  The  proposed  scheme  requires  ( k/2  +  3 )t 
modular  multiplications  to  generate  a  signature,  m  transmis¬ 
sions  to  complete  the  multisignature  procedure  and  the  in¬ 
formation  redunduncy  of  (m\ID\  +  ktm  +  \N\)  bits  must  be 
stored  for  multisignature  verification  where  t  is  a  security  level 
parameter. 

Since  the  new  proposed  multisignature  scheme  is  based  on 
the  Fiat- Shamir  scheme,  the  scheme  is  more  efficient  than 
other  RSA  based  multisignature  schemes  and  as  secure  as  the 
Fiat-Sliamir  scheme.  Owing  to  the  high  processing  speed  and 
the  high  degree  of  satisfaction  to  the  requirements,  the  new 
proposed  scheme  is  suitable  for  electronic  contract  systems. 
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Abstract  —  We  introduce  a  new  model,  the  broad¬ 
cast  channel  with  confidential  messages,  with  tamper¬ 
ing.  Here,  the  enemy  not  only  taps  the  wire  but  also 
actively  tampers  the  signal  communicated  over  the 
wire.  We  show  that  the  legitimate  users  always  have 
to  take  a  certain  worst  case  scenario  into  account. 

Csiszar  and  Korner  [1]  introduced  the  broadcast  channel 
with  confidential  messages  (BCC).  It  consists  of  three  partic¬ 
ipants:  two  legitimate  users  of  the  main  channel,  Alice  and 
Bob,  and  a  wire-tapper,  Eve,  the  enemy.  Alice  and  Bob  want 
to  generate  a  secret  key  such  that  Eve  can  only  obtain  a  neg¬ 
ligible  amount  of  information  about  it.  It  is  assumed  that  all 
players  know  everything;  the  codes  and  protocol  used  by  the 
legitimate  users,  and  the  noise  characteristics  of  the  main  and 
wire-tap  channel.  The  central  question  is  to  determine  the 
secrecy  capacity  Cs,  which  is  the  maximal  rate  at  which  Alice 
and  Bob  can  generate  a  secret  key. 

We  introduce  the  BCC,  with  tampering  (BCCT),  in  which 
in  addition  Eve  actively  tampers.  Now,  solely  Eve  is  assumed 
to  know  everything.  Alice  and  Bob  can  measure  the  noise 
characteristics  of  the  main  channel,  and  they  have  limited 
knowledge  about  the  noise  characteristics  of  the  wire-tap  chan¬ 
nel. 

If  Alice  wants  to  transmit  the  binary  signal  a  £  {0, l}n 
then  she  converts  it  into  a  polar  analog  signal  a(t)  with  signal 
power  Sa ,  which  she  transmits  to  Bob  over  a  distortionless 
channel  with  length  l a  4^b  and  attenuation  coefficient  ot.  The 
first  part  of  the  main  channel  from  Alice  to  the  position  where 
Eve  taps  the  wire  has  length  l a,  and,  hence,  transmission  loss 
(La)<ib  =  &Ia-  The  second  part  of  the  main  channel  has 
length  Ib  and  transmission  loss  (Ls)dB  ~  ocIb-  Bob  uses  an 
amplifier  with  noise  figure  ub  and  power  gain  gs  to  obtain 
an  analog  signal  6(t),  which  he  converts  to  a  binary  signal 
b  £  {0,  l}n.  The  wire-tap  channel  of  Eve  causes  transmission 
loss  Le-  Eve  uses  an  amplifier  with  noise  figure  tie  and  power 
gain  gs,  to  obtain  an  analog  signal  e(f),  which  she  converts 
to  a  binary  signal  e  £  {0,l}n.  The  noise  caused  in  both 
amplifiers  is  additive  white  Gaussian  noise  with  zero  mean, 
independent  of  the  signals  b(t)  and  e(t).  We  assume  that  the 
electrical  noise  of  the  channels  is  nihil  compared  to  the  noise 
caused  in  both  amplifiers.  Finally,  Eve  has  inserted  a  tamper 
device  which  causes  additional  transmission  loss  (Lt)cLb  = 
€t(Ia  4  l b)  independent  of  her  signal  e(t). 

Alice  and  Bob  know  5a,  ns  (which  they  can  measure  as 
accurate  as  they  like),  l a ,  Ib-  They  only  know  a  probabil¬ 
ity  distribution  of  the  attenuation  coefficient  Pr(a),  and  they 
know  Le  and  tie  with  Le  <  Le  and  ue  <  In  the  worst 
case  for  Alice  and  Bob  Le  =  Le  and  tie  —  ue-  The  tam¬ 
per  device  introduces  additional  transmission  loss  Lt .  Since, 
the  exact  value  of  the  transmission  loss  over  the  main  channel 
is  unknown  Lt  is  unknown.  Alice  and  Bob  can  only  obtain 
statistical  information  about  Lt  (as  we  shall  see). 

The  signal  power  of  b(t)  is  SAgB / LaLtLb,  and  the  corre¬ 


sponding  noise  power  is  ub9b-  Hence,  the  signal  to  noise  ratio 
of  the  main  channel  equals  ( S/N)ab  =  Sa/ LaLtLbtib  = 
SA/(nB10<~o‘+eTHlA+lB)/10).  Alice  and  Bob  view  this  as  a 
function  of  a  +  €t-  Thus  the  channel  from  Alice  to  Bob 
with  input  a  and  output  b  is  a  BSC(pab{oc  4-  £t))  with 


Pab(<*  +  eT)  =  Q(\/  (S/N)  ab),  that  is  a  binary  symmet¬ 
ric  channel  with  cross-over  probability  Q(^(S/N)ab),  where 


~  fx  €  /2^- 


y/2? r  — 

Suppose  that  prior  to  the  secret  key  generation  Alice  trans¬ 
mits  m  zero’s  to  Bob.  Let  the  random  variable  fc(m)  be 
the  number  of  Is  Bob  receives  over  the  main  channel.  Let 
P(x)  =  Pr(a  >  x).  Then  we  can  derive 


Pr 


^  Vab{ol  4  et) 


k(m) 

m 


<  s,ct  >  x 


1 

4  me2 


Hence,  for  m  large  enough  Alice  and  Bob  may  approximate 
Pab{&  4  £t)  by  fc(m)/m,  which  leads  to  an  approximation 
a(k(m)/m)  of  a  4  et  (a  =  Pab)‘  More  precisely 


Pr 


^  a  4  6t  —  & 


<  e,  a  >  z  I  >  (  1  — 


4  m6(ey 


P(x). 


The  signal  power  of  e(f)  is  Sa9e/LaLe ,  and  the  corre¬ 
sponding  noise  power  is  rtE9E ■  Hence,  the  signal  to  noise 
ratio  of  the  channel  from  Alice  to  Eve  equals  ( S/N)ae  = 
Sa/LaLetie  <  SA/10alA/1OLEfiE -  Thus  the  channel  from 
Alice  to  Eve  with  input  a  and  output  e  is  a  BSC(pae)  with 
Pae  —  Q( (S/N)ab)  >  Q(\/ SA/10alA/loLEfiE)  =Pae(oc). 

We  notice  that  all  noise  is  generated  in  both  amplifiers. 
Hence,  the  BCCT  is  equivalent  to  the  binary  symmetric  BCC, 
in  which  the  main  channel  is  a  BSC(pab{<x  +  er))  and  the 
wire-tap  channel  is  a  BSC(pae)  from  Alice  to  Eve.  With 
probability  at  least  (1  —  1/4 m6(e)2)P(x)  the  worst-case  sce¬ 
nario  for  Alice  and  Bob  is  a  BSC(pAB(oi(k(m)/m)  4-  e)) 
as  main  channel  and  a  BSC(Pae(x))  as  wire-tap  channel. 
The  secrecy  capacity  of  the  worst-case  scenario  is  equal  to 
Cs(x,fc(m),m,e)  =  h(pAE(x))  —  h(pAB(oc(k(m)/m)  +  e)  [1]. 
We  notice  that  a  secret  key  generated  in  the  worst-case  sce¬ 
nario  is  also  secret  (for  Eve)  in  the  real  situation.  We  conclude 
that  in  the  BCCT  a  secret  key  can  be  generated  with  rate  at 
least 


sup  (1  —  1/4 m8(e)2)P(x)  Cs(x)  fc,  m,  e)Pr(k(m)  =  k). 


The  final  conclusion  is:  “Alice  and  Bob  need  to  take  tam¬ 
pering  by  Eve  into  account,  which  implies  that  they  have  to 
realize  that  k(m)/m  is  an  approximation  of  a 4- €t,  not  of  a”. 
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Abstract  —  The  two  notions  in  the  title  coincide. 

I.  Introduction 

Secret  sharing  schemes  (SSS)  made  their  appearance  (see 

[1],  [2])  in  the  form  of  threshold  (n,  r) -schemes  in  1979. 
R.McEliece  and  D.Sarwate  pointed  out  [3]  a  relationship  be¬ 
tween  threshold  schemes  and  MDS-codes  in  1981.  In  1983 
E.Karnin,J. Greene  and  M. Heilman  [4]  gave  an  information- 
theoretic  approach  to  SSS  and  proved  some  upper  and  lower 
bounds  on  the  number  of  participants  in  an  ideal  perfect 
threshold  SSS.  The  proof  is  based,  in  fact,  on  the  observa¬ 
tion  that  each  ideal  perfect  threshold  SSS  determines  a  unique 
MDS  code,  and  vice  versa ,  when  the  secret  and  shadows  be¬ 
long  to  the  same  finite  field.  E.F.Brickell  and  D.M.Davenport 
[6]  considered  combinatorial  ideal  perfect  SSS  for  the  general 
access  structure  and  established  the  relationship  between  such 
schemes  and  matroids.  Prom  their  results  the  equivalence  of 
combinatorial  ideal  perfect  threshold  SSS  and  MDS  codes  (i.e. 
orthogonal  arrays  OA\(t,  n-f  1,  g))  follows  almost  immediately. 
In  this  paper  we  give  an  independent,  self  contained  proof 
(following  the  ideas  in  [4])  for  the  (formally)  more  general 
information-theoretic  definition  of  ideal  SSS. 

II.  Definitions  and  a  useful  lemma 
Let  So]  Si  . . .  ,<Sn  be  finite  sets  used  by  an  SSS  dealer  as  al¬ 
phabets.  So  is  for  the  secret,  and  other  Si  for  shares.  We  call 
a  point  s  =  (so,  si, . . . ,  sn)  E  S  =  So  x  •  *  •  x  Sn  a  sharing  rule. 
Any  SSS  can  be  defined  as  a  probability  distribution  P(s)  on 
<S,  which  the  dealer  uses  for  generating  sharing  rules,  i.e.  for 
choosing  a  secret  so  and  giving  a  corresponding  share  Si  to 
the  z-th  participant. 

Let  T  be  an  access  structure  ,  i.e.  a  set  of  subsets  of  {1, . . . ,  n} 
with  the  monotonic  property  (A  E  P,A  C  B  imply  B  E  T). 
Consider  So,  ■  •  • ,  Sn  to  be  random  variables  with  P  as  their 
mutual  distribution.  We  call  a  pair  (P,  S)  a  perfect  SSS,  real¬ 
izing  an  access  structure  T  if  (see  [4],  [5])  H(S0  |  Si,  i  E  A)  -  0 
or  H(So)  according  as  A  E  T  or  not. 

Denote  by  Fmin  the  set  of  minimal  subsets  of  I\  The  fol¬ 
lowing  lemma  (see  [5])  is  very  useful 

Lemma  1  H(Sj  |  S;,z  G  A\{j})  >  H(So)  for  any  A  E  Tmin 
and  any  j  E  A. 

Corollary  1  H(Si}i  6  A)  >|  A  |  • H(S0 )  for  any  A  E  rmin. 

III.  An  equivalence  involving  the  combinatorial 
definition 

We  call  V  =  {s  E  S  |  P(s)  >  0}  the  “code”  of  the  SSS 
(Pi  S).  Let  q  ~  |So|-  Let  us  note  that  if  the  pair  (P,  S) 
perfectly  realizes  an  SSS  for  the  access  structure  T  for  some 
distribution  p(so)  on  secrets,  then  any  distribution  on  So  can 


be  perfectly  realized  by  the  same  S,  the  code  V  and  an  ap¬ 
propriate  choice  of  P.  Prom  this  remark  and  Corollary  1  one 
can  show  that  the  following  are  true. 

Lemma  2  |V|  >  q ^  for  any  perfect  SSS  and  any  A  E  rmin. 

Corollary  2  For  any  perfect  (n,r) -threshold  SSS  the  cardi¬ 
nality  of  its  code  satisfies  the  inequality  |V|  >  qT . 

We  will  distinguish  between  two  definitions  of  ideal  SSS.  The 
combinatorial  definition  of  an  ideal  perfect  SSS  is  that  |So|  = 
|S;|  for  all  i.  A  (formally)  weaker  information-theoretic  (IT) 
definition  is  that  H(Si)  <  H(So)  for  all  i.  The  following 
corollary  of  Lemma  1  (see  [4])  shows  that  the  set  V  is  a  code 
with  minimal  Hamming  distance  d(V)  >  n  -  r  +  2  . 

Corollary  3  H(Sj  \  =  0  for  any  IT-ideal 

perfect  (n^r) -threshold  SSS  and  any  distinct  j, E 
{0, 1, . . . ,  n},  i.e.  Sj  is  a  function  of  , . . . ,  SiT . 

Hence,  for  the  combinatorial  definition  of  an  ideal  perfect  SSS, 
Corollaries  2  and  3  ensure  that  the  code  V  of  an  ideal  perfect 
(n,  r) -threshold  SSS  is  a  q-  ary  code  of  length  n- hi,  distance 
d(V)  >  n  -  r  +  2  and  cardinality  |U|  >  qr .  Therefore  V  is  an 
MDS  code  with  |  V  |=  qT  and  d(V)  —  n  —  r  4-  2  (the  converse, 
that  any  MDS  code  with  the  above  parameters  generates  an 
ideal  perfect  (n,  r)-threshold  SSS,  is  rather  obvious). 

IV.  The  main  result  -  an  equivalence  involving 

THE  INFORMATION-THEORETIC  DEFINITION 
Now  we  can  prove  that  the  same  equivalence  is  true  for 
IT-ideal  SSS  also.  Denote  by  VA  the  punctured  code  ob¬ 
tained  from  V  by  deleting  coordinates  “outside  of  A”  (i.e. 
not  belonging  to  A).  Corollary  3  states  that  \V\  —  \VA\  for 
any  A  :  |A|  =  r.  On  the  other  hand,  the  random  variables 
Si1  ...  ,SiT  are  mutually  independent  (see  Lemma  1).  There¬ 
fore,  |V5lf...fir|  =  •  •  •  |<StT|-  Hence,  the  cardinalities  of  all 

sets  Si  are  equal  to  |<So|  =  q  and  we  again  have  the  case  of 
combinatorial  ideality. 
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Abstract  -  In  this  paper  the  use  of  product  codes 
cryptographic  purposfor  es  is  discussed.  The 
codes  are  used  in  a  scheme  that  applies  a  special 
type  of  structured  errors  that,  as  far  as  we  know, 
do  not  exist  in  any  real  communication  channel. 
Although  this  fact  seems  of  no  importance,  since 
the  errors  in  any  error-correcting  code  based 
cryptosystem  are  artificially  generated  at  the 
transmitter,  its  use  allows  an  improvement  in  the 
security  level  in  comparison  with  similar 
schemes. 

I.  SUMMARY 

The  use  of  burst-correcting  product  codes  for 
cryptographic  purposes  has  been  recently  investi¬ 
gated  [  1],[2],  where  a  private-key  cryptosystem 
was  proposed,  based  on  the  fact  that  the  single 
burst-correcting  capacity  of  a  code  is,  in  general, 
larger  than  its  random  error-correcting  capacity. 
In  this  paper  the  scheme  is  revisited  and  a  new 
class  of  product  codes  to  implement  the  cryp¬ 
tosystem  is  proposed.  The  key  idea  of  the  origi¬ 
nal  scheme  has  been  to  use  a  code  which  is  ca¬ 
pable  of  correcting  a  special  kind  of  structured  er¬ 
rors  and  then  disguise  it  as  a  code  that  is  only  lin¬ 
ear,  which  makes  it  unable  of  correcting  the  er¬ 
rors  as  well  as  their  permuted  versions. 
Specifically,  in  the  search  for  such  a  structure, 
the  choice  for  bursts  and  burst-correcting  codes 
was  a  natural  one  in  the  context  of  error  control 
codes.  However,  we  observe  that  the  errors 
structure  to  be  used  does  not  have  to  necessarily 
exist  in  a  real  communication  channel,  once  that, 
for  cryptographic  purposes,  they  are  artificially 
generated  at  the  transmitter.  With  that  in  mind  we 
introduce  the  following  concepts : 

Definition  1  -The  direct  mapping  of  parameters 
1  and  s,  denoted  DMis  (.),  is  the  one  that  maps 

the  vector  v  =  (  vi  ,  .  v\s  )  into  the  matrix  V 

(vj,),  of  elements  Vjj  =  Vjs+j+i,  i  =  0,  1,  ...,  1-1, 
j  =  0,  1,  ...,  s-1. 

Definition  2-The  vector  e  =  (  ei  ,  e2  ,  ••••  tis  )> 
ej  EE  GF(q),  is  said  to  be  a  biseparable  error  over 
GF(q),  denoted  BSE  (l,s),  if  (i)  Its  components 
are  nonzero  distinct  elements  of  GF(q)  and  (ii) 
each  row  and  each  column  of  DMjs(e)  contains, 
at  most,  one  nonzero  component. 

From  the  above  definitions,  it  can  be  seen  that  the 


maximum  weight  of  a  BSE(l,s)  over  GF(q)  is 
comax  =  min  (q-1,  min(l,s»  and  the  number  of 
BSE's  with  a  given  weight  to  is  (co=l,2...  (omax) 

(q  _  K  «l. 

w  j  (1  +  ^ '  0(s  +  1  -  i) 

Proposition-  A  product  code  PC  (n,  k,  d) 
over  GF(q),  whose  constituent  row  and  col¬ 
umn  codes  are,  respectively,  single  parity- 
check  codes  Ci  (Ni  =  s+1,  Ki  =  s,  Dj  =  2) 
and  C  (N2  =  1+1,  K2  =1,  D2  =2),  can  correct 

BSE(l+l,s+l)'s  of  weights  up  to  comax. 
Denoting  by  G  the  generator  matrix  of  PC(n,  k, 
d),  the  encryption  procedure  consists  of  calculat¬ 
ing  the  ciphertext  C  from  the  plaintext  M,  using 

C  =  (MSG  +  EiS;(U)P  ,  where  EiS;C0  is  a  BSE 

(1+1, s+1)  of  weight  to,  Pisan  nxn  permuta¬ 
tion  matrix  and  S  is  a  k  x  k  scrambling  matrix, 
used  to  hide  the  structure  of  the  matrix  GP.  The 
working  factors  for  breaking  the  system  by  some 
of  the  attacks  that  are  typically  applied  against 
cryptosystems  based  on  error  control  codes,  are 
related  with  the  number  of  codes  in  the  class  de¬ 
fined  above,  which  is  Nc  =  (1+1)  (s+1)  (Is)!. 
Using  G'  =  SGP,  the  cryptanalyst  must  find, 
among  all  Nc  matrices,  one  of  the  (1+1)!  (s+1)! 
matrices  that  can  be  used  to  decode  the  corrupted 
received  vector  (ciphertext).  That  means  a  work¬ 
ing  factor  of  (1  s)  !  /  1  !  s!  ,  which  compairs 
favourably  with  the  results  obtained  by  the  previ¬ 
ous  scheme. 
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Abstract  —  We  consider  the  problem  of  partial  ap¬ 
proximation  of  binary  sequences  by  the  outputs  of  lin¬ 
ear  feedback  shift  registers.  A  generalization  of  the 
linear  complexity  profiles  of  binary  sequences  leads  to 
a  sequence  that  is  regarded  as  the  profile  of  interval 
linear  complexity.  Some  properties  of  this  sequence 
are  examined. 


I.  Introduction 

A  widely  used  criterion  of  the  linear  complexity  of  binary  se¬ 
quences  was  introduced  by  R.Rueppel  [1].  In  accordance  with 
this  criterion,  we  construct  an  integer- valued  sequence,  called 
the  ’linear  complexity  profile5  (LCP),  whose  j- th  component, 
Lj,  is  the  shortest  length  of  a  linear  feedback  shift  register 
(LFSR)  generating  the  first  j  bits  of  our  binary  sequence.  The 
component  Lj  can  be  found  using  Berlekamp- Massey  algo¬ 
rithm  [2],  and  pseudo-random  sequences  possess  an  LCP  with 
Lj  «  j / 2  for  all  j.  However,  there  are  many  examples  when 
the  deviations  of  the  LCP  from  the  sequence  jj 2,  j  —  1,2, ... 
do  not  characterize  the  ’randomness5  of  binary  sequences.  For 
example,  let  us  suppose  that  we  are  given  a  sequence  11*01°°, 
where  u*  is  a  sequence  of  length  n  having  a  ’good5  LCP.  Then 
u*  is  generated  by  an  LFSR  of  length  «  n/ 2.  Nevertheless, 
the  final  result  of  Berlekamp- Massey  algorithm,  applied  to  the 
whole  sequence,  is  the  LFSR  of  length  n  +  2,  and  the  LCP  is 
as  good  as  before  up  to  the  «  2rc-th  component.  It  is  easy  to 
see  that  if  we  construct  the  LCP  of  the  sequence  starting  at 
the  n  +  1-st  bit  then  the  conclusion  would  be  different.  Thus, 
to  extend  the  Rueppel’s  approach  we  need  to  construct  the 
LCPs  for  all  subsequences  of  the  input  sequence  and  select 
the  worst  one,  whose  deviations  from  the  line  j/2  should  be 
used  as  a  measure  of  complexity.  Such  a  procedure  seems  to 
be  rather  complicated. 

In  this  paper,  we  introduce  a  new  measure  of  complexity, 
called  an  interval  linear  complexity.  For  all  X  <  m,  where  m  is 
a  given  positive  integer,  we  find  all  the  fragments  of  the  binary 
sequence  that  have  length  X  *f  m  and  can  be  generated  by  an 
LFSR  of  length  X.  The  number  of  such  fragments  and  their 
lengths  contain  information  on  LCPs  for  all  starting  positions, 
and  the  results  of  analysis  can  be  useful  for  different  methods 
based  on  linear  approximations. 

II.  Some  properties  of  the  m-iNTERVAL  linear 

COMPLEXITY 

Let  u  =  be  a  binary  sequence.  We  assume  that 

«i  =  1  and  m  =  0  for  i  =  0,-1,...  For  all  k  >  0,  we  set 
u(jk)  =  (uj-fc+i,  and  write  T(L)  iff  there  exists 

a  binary  vector  (ai,...,<u)  such  that  ut  =  ai  ■  ut- i  +  ...  + 
ai  •  ut~L  for  all  t  6  {j  —  k  -f-  1,  ...,j>}.  Furthermore,  we  write 
uj  ^  ^  ^  ut  ^  b\  ■  ut- 1  + ...+  b^t  •  ut_it  for  at  least  one 

t  G  {j  —  k  +  1, j},  where  (6i ,  ...,£>£/)  is  any  binary  vector. 

Let  us  fix  j  >  m  and  define  X<m)  as  the  shortest  length  of  an 
LFSR,  generating  the  fragment  u^m)  provided  that  the  subse¬ 
quence  u^}m,  where  X  =  L^m\  forms  the  initial  content  of  the 
shift  register,  i.e.,  (a)  u^m)  X  :F(X^m));  (b)  if  X'  <  X<m),  then 


u(m)  T{V),  Using  conventional  notations  [2],  we  claim  that 

L[j  ^  =  L  iff  L  is  the  shortest  length  of  an  LFSR  generating 
the  subsequence  Uj-m-L+i , Uj ,  and  all  the  subsequences 
Uj-.m-it+x , uj,  where  L '  <  X,  cannot  be  generated  by  an 
LFSR  of  length  X'.  The  parameter  X^  will  be  referred  to 
as  the  m-Interval  Linear  Complexity  (m- ILC)  of  u  at  position 
j,  and  the  sequence  L^2, ...  will  be  regarded  as  the 

profile  of  the  m-ILC  of  u.  Some  properties  of  the  m-ILC  are 
detailed  below. 

Theorem. 

T  Let  Lij  be  the  shortest  length  of  an  LFSR  generating 
U{, ...,  Uj.  Then 

X<m)  =  min  LtJ. 


2.  If  X^m)  =  X  <  m,  then  there  is  exactly  one  LFSR  of 
length  X  generating  Uj-m-L+i, 

3.  If 

r  Lf_\  #  Lf\ 

L(r)  =  -  =  L%L=L<m, 
l  L<$  *  Lftl, 

then 


-tv-1  >  rn, 

ij+i+Ai  =  m  +  1  f°r  all  AI  ■■ 

rtm)  ^ 

q+l+m- L  <  m- 


0, ...,  m  —  1 


The  theorem  claims  that  the  profiles  of  the  m-ILC  have 
very  regular  structure.  If  the  current  element  of  the  profile, 
X^| ,  is  greater  than  m  then  the  next  element,  X^m),  can  be 
less  than  m,  i.e.,  the  profile  ’falls  into  the  pit’.  In  this  case,  the 
profile  can  stay  in  the  pit  for  /  times  or  jumps  at  the  level  m-f  / 
and  stays  at  this  level  for  m  —  X^m^  times.  The  parameters 

/  and  m  —  4m)  can  be  interpreted  as  the  ’length’  and  the 
’depth’  of  the  pit,  and  the  duality  between  them  takes  place. 

Such  a  behaviour  gives  an  opportunity  to  realize  an  inter¬ 
val  atack  on  the  stream  cipher  when  the  cipher  is  constructed 
using  some  complex  scheme,  but  an  eavesdropper  approxi¬ 
mates  its  fragments  by  LFSRs  of  length  <  m.  Suppose  that 
the  eavesdropper  has  some  set  of  the  key  words  and  assumes 
that  they  are  written  in  the  plain  text.  If  he  is  right  and  the 
position  of  one  of  these  words  corresponds  to  a  pit  in  the  pro¬ 
file  of  the  m-ILC,  then  he  reconstructs  the  LFSR  and  reads 
all  the  other  words  of  the  plain  text  while  the  corresponding 
elements  of  the  profile  belong  to  this  pit. 
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The  permutation  group  of  affine- invariant  codes 
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Abstract  —  Affine-invariant  codes  are  primitive 
cyclic  codes  whose  extension  is  invariant  under  the 
affine-group.  We  present  the  formal  expression  of  the 
permutation  group  of  these  codes.  We  after  give  sev¬ 
eral  tools  in  order  to  determine  effectively  the  per¬ 
mutation  groups.  Our  main  application  is  the  per¬ 
mutation  group  of  primitive  narrow-sense  BCH-codes 
defined  on  any  prime  field. 

I.  The  formal  expression 

The  reader  can  refer  to  [2,  4]  for  the  definition  of  affine- 
invariant  codes  and  their  description  by  antichains.  The  per¬ 
mutations  of  coordinate  places  which  send  a  code  C  into  itself 
form  the  permutation  group  of  C,  denoted  by  Per(C );  when 
the  code  is  binary,  this  group  is  actually  the  automorphism 
group  oi  C,  usually  denoted  by  Aut{C ). 

Let  G  be  the  finite  field  of  order  pm  and  let  k  be  a  sub¬ 
field  of  G.  We  denote  by  AGL(m,  p)  the  affine  group  of  G 
over  GF(p)  and,  for  any  divisor  e  of  m,  by  AGL(rn/e,pe) 
the  affine  group  of  G  over  GF(pe).  The  corresponding  semi- 
affine  group  is  denoted  by  AFL(m/e,  pe)  .  We  consider  cyclic 
codes  C  of  length  pm  —  1  over  k.  The  extended  code  C  is 
said  to  be  an  affine-invariant  code  if  and  only  if  its  permuta¬ 
tion  group  contains  AGL{l,prn).  Affine-invaraint  codes  form 
a  class  including  codes  of  great  interest  as  BCH-codes  or  Reed- 
Muller  (RM)  codes  (and  generalized  RM-codes).  BERGER 
has  recently  proved  that  the  permutation  group  of  an  affine- 
invariant  code  is  contained  in  AGL(m,p)  [l].  Then  a  formal 
expression  of  the  permutation  group  of  any  affine-invariant, 
code  can  be  deduced: 

Theorem  1  Denote  by  6  k  the  kth-poiver  of  the  Frobenius 
mapping  on  G.  Let  C  be  a  non  trivial  affine-invariant  code; 
let  £  be  the  smallest  integer  dividing  m  such  that  6c  leaves  C 
invariant.  Then  there  is  a  divisor  e  of  m  such  that  Per(C) 
is  generated  by  AG L(m/e,pe)  and  6c  -  respectively  Per (C)  is 
generated  by  GL(m/e,pe)  and  6c. 

II.  TO  DETERMINE  THE  PERMUTATION  GROUPS 

For  a  large  part  of  affine-invariant  codes,  mainly  when  m 
is  a  prime,  the  permutation  group  is  completely  determined 
by  applying  Theorem  1.  The  problems  appear  when  rn  has  no 
trivial  divisors. 

Let  S  =  [0,pm  —  1]  and  let  a  be  a  primitive  root  of  G.  We 
call  defining  set  of  C  the  subset  T  of  5  consisting  of  0  and 
of  the  s  such  that  as  is  a  zero  of  C.  Let  e  be  a  divisor  of  m 
and  v  =  pe.  We  identify  any  s  £  S  with  its  v-ary  expansion 

(s0,...sm/e_i).  The  v-weight  of  s  is  w„(s)  =  YlTJo^  s" 
Then  we  can  define  the  poset  (5,  <e)  :  s  and  t  in  S  satisfy 
s  <Ce  t  iff  u)v(pks)  <  wv{pkt)}  for  all  A;  in  [0,  e  —  1].  In  terms  of 
partial  order  the  condition  of  Dels  ARTE,  given  in  [3],  becomes: 
Theorem  2  Assume  that  C  is  affine-invariant.  Then  C  is 
invariant  under  AGL(m/e,pe)  iff  its  defining  set  T  satisfies : 

t  e  T  and  s  <e  t  =>  s  G  T  , 


We  give  two  conditions  equivalent  to  Delsarte’s  condition, 
providing  new  tools  for  the  study  of  infinite  classes  of  codes. 
The  first  one  is  derived  from  the  result  of  DELSARTE,  by  using 
the  description  of  affine-invariant  codes  by  antichains.  The 
second  one  comes  from  the  polynomial  representation  of  per¬ 
mutations: 

Theorem  3  The  codeC  is  invariant  under  AG L(m/e,pe)  iff 
its  defining  set  T  satisfies: 

t  eT  and  j  <m  t  =>  t  +  j(pe  —  1)  €  T  . 

III.  The  p-ary  BCH-codes 

Theorem  4  Denote  by  B(d ),  the  BCH-code  of  designed  dis¬ 
tance  d  and  length  pm  —  1  over  GF(p),  and  by  B(d)  the 
extended  code .  Suppose  that  B(d)  is  not  trivial,  i.e.  d  £ 
{l,pm  —  1}  (in  the  trivial  case  Per (B(d))  is  the  symmetric 
group).  Then  the  permutation  group  of  B(d)  is  the  semi-affine 
group  AT L(l,pm),  except  for  the  following  cases. 

•  When  p  =  2,  we  have  three  kinds  of  exception: 

1.  If  d  £  {3,  2m~1  —  1},  for  any  m,  or  d  —  7  for  m  = 
5,  then  Aut(B(d))  is  AGL(m,  2);  whence  B(d)  is 
a  Reed-Muller  code. 

2.  If  d  —  2m“1  —  2^m~2^2  —  1,  for  m  even,  then 
Aut(B(d))  =  AFL(  2,2m/2). 

3.  If  m  =  6,  then  Aut(B( 7))  =  AT L{ 2,23)  and 
Aut(B(15))  =  ATX(3,22). 

•  For  p  odd,  the  only  exceptions  are  whenever  B(d)  is  a  p- 
ary  Reed-Muller  code.  That  is  :  d  6  {2,  pm_1  (p—  1)  —  1}, 
for  any  m;  d  =  p2  —  2p  —  1,  for  m  —  2  and  p  >  3;  d  =  5 
for  m  —  3  and  p  =  3.  In  these  cases  Per(B(d))  = 
AGL(m,p) . 

Note  that  Per (B(d))  is  the  linear  group  GL{)  (or  the  semi- 
linear  group  rL())f  when  Per(B(d))  is  AGL{)  (or  AT L{)). 

References 

[1]  T.  BERGER,  On  ike  Automorphism  Groups  of  Affine-Invariant 
codes,  Designs,  Codes  and  Cryptography,  to  appear. 

[2]  P.  CHARPIN,  Codes  cycliques  etendus  affines- invariants  et  an- 
tichaines  d’un  ensemble  partiellement  ordonne ,  Discrete  Math¬ 
ematics  80  (1990),  229-247. 

[3]  P.  DELSARTE  On  cyclic  codes  that  are  invariant  under  the  gen¬ 
eral  linear  group ,  IEEE  Trans,  on  Info.  Theory,  vol.  IT-16,  n.6, 
1970. 

[4]  T.  Kasami,  S.  Lin  &  W.W.  Peterson  Some  results  on  cyclic 
codes  which  are  invariant  under  the  affine  group  and  their  ap¬ 
plications,  Info,  and  Control,  vol.  11,  pp.  475-496  (1967). 

[5]  C.C.  Lu  &  L.R.  WELCH,  On  automorphism  groups  of  binary 
primitive  BCH  codes ,  Proceedings  1994  IEEE  International 
Symposium  on  Information  Theory,  Trondheim,  Norway,  p.51. 


491 


Mixed-Rate  Multiuser  Codes  for  the  T-User  Binary  Adder  Channel 

A.  Brinton  Cooper  III 

Information  Science  and  Technology  Directorate,  Army  Research  Laboratory,  APG,  MD  21005-5067  USA 

Brian  L.  Hughes1 

Department  of  Electrical  and  Computer  Engineering,  The  Johns  Hopkins  University,  Baltimore,  MD  21218  USA 


Abstract  —  Coding  schemes  for  the  T-user  binary 
adder  channel  are  investigated.  Recursive  construc¬ 
tions  are  given  for  two  families  of  mixed-rate,  mul¬ 
tiuser  codes.  These  basic  codes  can  be  combined  by 
time-sharing  to  yield  codes  approaching  most  rates 
in  the  T-user  capacity  region.  The  best  codes  con¬ 
structed  herein  achieve  a  rate  sum,  Ri  +  •  ■  ■  +  RT , 
which  is  higher  than  all  previously  reported  codes 
for  T  >  4  and  is  within  0.519  bits/channel  use  of  the 
information-theoretic  limit. 

Summary 

One  of  the  most  extensively  investigated  multiple- access 
channels  is  the  binary  adder  channel,  described  as  follows.  T 
users  communicate  with  a  single  receiver  through  a  common 
discrete-time  channel.  At  each  time  epoch,  user  i  selects  an 
input  Xi  G  {0, 1}  for  transmission.  The  channel  output  is 

y  =  tx‘  a) 

i=  1 

where  summation  is  over  the  real  numbers.  We  assume  that 
there  is  no  feedback  and  all  users  are  synchronized. 

Chang  and  Weldon  [1]  showed  that  the  capacity  region  of 
the  T-user  binary  adder  channel  is  the  set  of  all  nonnegative 
rates  (Ri, . .  .  ,  R t)  satisfying 

0  <  Ri  <  Hi  , 

0  <  Ri  +  Rj  <  if 2  , 

0  <  Ri  +  •  4-  RT  <  Ht  ,  (2) 

where 

=  -E(T)2"mio^(7')2"m-  (3> 

In  particular,  observe  that  the  largest  achievable  sum-rate, 

Rsum.(T)  =  R\  +  ■  •  ■  4-  Rt ,  is  CSUm,(T)  =  Ht,  which  is  called 
the  sum,- capacity. 

Chang  and  Weldon  [I]  also  presented  a  family  of  mul¬ 
tiuser  codes  which  are  asymptotically  optimal  in  the  sense 
that  RSUm{T)/CSUm,{T )  — ►  1  as  T  — *  -hoo.  In  their  construc¬ 
tion,  each  user’s  code  consists  of  only  two  codewords  which 
are  defined  recursively  (so  Ri  =  R2  =  •  •  ■  =  Rt).  This  basic 
construction  has  been  generalized  in  several  ways  [2,  3,  5,  7], 
and  alternate  constructions  have  been  proposed  based  on  coin 
weighing  designs  [6]  and  additive  number  theory  [4]. 

1B.  L.  Hughes  was  supported  by  the  National  Science  Foun¬ 
dation  under  grant  NCR-9217457,  and  by  the  U.S.  Army  Re¬ 
search  Laboratory  and  the  U.S.  Army  Research  Office  under  grant 
DAAL03-89-K-0130. 


Chang  and  Weldon’s  construction  shows  how  to  approach 
one  point  on  the  boundary  of  the  T-user  capacity  region.  Sim¬ 
ilarly,  all  subsequent  work  for  T  >  2  has  focused  on  the  sym¬ 
metric  rate  case,  except  for  [5]  where  Ri  =  R2  =  •  •  •  =  Rt- i 
but  Rt  >  Ri.  It  is  natural  to  ask,  however,  whether  other 
points  in  the  capacity  region  can  be  approached  by  a  similar 
construction. 

This  talk  will  present  two  mixed-rate,  multiuser  code  con¬ 
structions  for  the  binary  adder  channel.  The  codewords  con¬ 
tained  in  these  codes  are  equivalent,  up  to  an  affine  transfor¬ 
mation,  to  those  in  [1]  and  [6];  however,  the  recursions  are 
adapted  in  order  to  distribute  these  codewords  among  as  few 
users  as  possible.  As  a  result,  we  obtain  codes  with  a  wide 
range  of  information  rates.  In  particular,  we  show  that  these 
basic  codes  can  be  combined  to  approach  all  rates  in  the  poly¬ 
tope 


0 

< 

Ri 

vu 

1 

£ 

VI 

0 

< 

Ri 

-h  Rj 

<  h2  - 

C-2  , 

0 

< 

Ri 

+  Rt  £ 

Ht  —  f-T 

where  0  <  em  <  1.090  bits/channel  use,  1  <  m  <  T.  More¬ 
over,  we  construct  a  family  of  T-user  codes  with  RSUm(T)  > 
urn  (T)  —  0.519  bits/channel  use,  which  exceeds  the  sum- 
rate  of  all  codes  previously  reported  in  [l,  2,  3,  4,  5,  6,  7]  for 
T  >  4.  Extensions  to  a  T-user,  Q-frequency  adder  channel 
are  also  discussed. 
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Abstract  —  In  this  paper,  new  (pm,m)  and  (pm,m- 
1)  quaternary  linear  codes  of  dimension  5  are  pre¬ 
sented.  These  codes  belong  to  the  class  of  quasi- 
twisted  codes. 

I.  Introduction 

A  fundamental  and  challenging  problem  in  coding  theory  is  to 
find  a  linear  ( n,k )  code  over  GF(g)  achieving  the  maximum 
possible  minimum  Hamming  distance.  This  value  is  denoted 
as  dq(n,  k),  and  linear  codes  which  have  a  minimum  distance 
equal  to  dq(n,k)  are  called  optimal  For  q  =  4,  dq(n,k)  has 
been  determined  for  k  <  3  and  all  but  10  values  of  d  for 
k  —  4[1],  Many  values  of  d4(n,  5)  have  been  established,  and 
Brouwer  [2]  maintains  an  up  to  date  table  of  upper  and  lower 
bounds  for  k  <  n  <  132.  In  this  paper  several  values  of 
cU(7i,  5)  are  determined. 


V  can  be  constructed  by  deleting  m  —  k  rows  of  (1).  Codes 
with  k  =  5  and  m  —  5  and  6  are  considered  here. 

A  search  for  good  QT  codes  requires  a  representative  set  of 
defining  polynomials  [4]  which  can  be  enumerated  using  Burn¬ 
side’s  Lemma[5].  For  q  —  4  and  m  =  5,  there  are  70  for  all 
values  of  a.  However,  since  4  J[  m,  the  quaternary  QT  codes 
with  q^I  are  not  equivalent  to  QC  codes[4].  The  results  of 
a  greedy  local  search  are  given  in  the  next  section. 

III.  Construction  Results 

In  addition  to  establishing  many  lower  bounds  on  d4(n,  5), 
the  following  new  optimal  codes  (based  on  the  bounds  in  [2] 
and  the  Griesmer  bound)  were  found.  A  (50,5)  code  with 
d  =  35,  A  (105,5)  code  with  d  =  76,  a  (110,5)  code  with 
weight  distribution 


Weight 

0 

80 

84 

88 

92 

Count 

1 

618 

225 

105 

75 

II.  Quasi-Twisted  Codes 

The  class  of  quasi-twisted  (QT)  codes  is  a  generalization  of 
the  class  of  quasi-cyclic  (QC)  codes  over  GF (q),  q  >  2 [4].  A 
code  is  called  quasi-twisted  if  a  negacyclic2  shift  of  a  codeword 
by  p  positions  results  in  another  codeword.  The  blocklength, 
n,  of  a  QT  code  is  a  multiple  of  p,  so  that  n  =  mp.  Many 
QT  codes  codes  can  be  constructed  from  m  x  m  twistulant 
matrices  (with  a  suitable  permutation  of  coordinates).  In  this 
case,  the  generator  matrix,  G,  can  be  represented  as, 

G  =  [Bu  B2,  ...,  Bp]  (1) 

where  the  Bz  are  m  X  m  twistulant  matrices  of  the  form 


a  (115,5)  code  with  d  =  80,  a  (120,5)  code  with  distribution 


Weight 

0 

88 

96 

120 

Count 

1 

765 

255 

3 

a  (126,5)  code  with  d  —  92,  a  (132,5)  code  with  d  =  96,  a 
(205,5)  code  with  weight  distribution 


Weight 

0 
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160 
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1 

810 

120 

4 

90 

and  a  (216,5)  code  with  weight  distribution 


Weight 

0 

160 

168 
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and  a  eGF(g)\{0}.  The  algebra  ofmxm  twistulant  matrices 
over  GF (q)  is  isomorphic  to  the  algebra  of  polynomials  in  the 
ring  GF(q)[x]/xm  -  a  if  Bi  is  mapped  onto  the  polynomial 
bt(x)  formed  from  the  entries  in  the  first  row  of  Bl.  The  bi(x) 
are  called  defining  polynomials . 

The  1-generator  QC  codes[6]  can  be  generalized  to  1- 
generator  QT  codes.  The  order  of  a  1-generator  QT  code, 
V,  is  defined  as 


IV.  Summary 

The  construction  of  quasi-twisted  (QT)  codes  over  GF(4) 
has  been  presented.  Many  of  the  codes  constructed  have 
a  minimum  distance  which  establishes  a  lower  bound  on 
the  maximum  minimum  distance.  The  new  codes  include 
several  optimal  codes  which  determine  d4(rc, 5)  for  n  = 
50, 105, 110, 115, 120, 126,  132,  205  and  216. 
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where  a  £GF(g)\{0},  and  k ,  the  dimension  of  V ,  is  equal  to 
the  degree  of  h(x).  If  h (x)  has  degree  m,  (1)  is  a  generator 
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Abstract  —  We  discuss  the  error  correction  capab¬ 
ilities  of  a  class  of  Hecke  modules  as  linear  codes  and 
free  linear  block  m-PSK  modulation  codes. 

We  provide  an  introduction  to  the  study  of  modules  for  a 
Hecke  algebra  (of  type  A)  as  linear  codes  for  the  Hamming  and 
the  Euclidean  metric.  These  modules  are  called  Hecke  mod¬ 
ules  and  play  an  important  role  in  another  branch  of  math¬ 
ematics,  representation  theory  of  groups  and  algebras  [1]. 

We  first  introduce  a  class  of  Hecke  modules  in  a  purely 
combinatorial  manner.  In  particular,  we  provide  a  basis  for 
each  of  these  modules  which  can  be  easily  calculated  by  a 
computer  [2]. 

These  Hecke  modules  are  very  interesting  from  the  point 
of  view  of  coding  theory.  For  this,  note  that  these  Hecke 
modules  can  be  defined  as  vector  spaces  over  any  field  and  so 
may  be  considered  as  linear  block  codes  [2], [3].  In  particular, 
the  primitive  generalized  Reed-Muller  codes  over  the  primes 
as  well  as  shortened  versions  of  them  and  the  Simplex  codes 
emerge  as  subclasses  of  our  Hecke  modules  in  a  very  natural 
way.  We  review  Hecke  modules  whose  coding  parameters  are 
known  as  the  Specht  modules  and  several  one-step  majority 
logic  decodable  codes  [4].  Then  we  consider  so-called  char¬ 
acteristic  Hecke  modules.  A  characteristic  Hecke  module  is  a 
free  Z-module  yielding  a  linear  code  over  GF(p)  by  reducing 
the  coefficients  of  all  linear  combinations  of  its  generating  ele¬ 
ments  modulo  p  so  that  the  parameters  n ,  k  and  d  of  the  code 
are  independent  of  the  choice  of  p.  For  instance,  binary  Reed- 
Muller  codes  and  Simplex  codes  emerge  in  this  way  but  not 
generalized  Reed-Muller  codes. 

Moreover,  these  Hecke  modules  can  be  considered  as  free 
modules  over  the  ring  or  one  of  its  extension  rings 
and  therefore  represent  free  linear  block  m-PSK  modulation 
codes  [5].  We  have  calculated  the  minimum  squared  Euclidean 
distance  of  the  Hecke  modules  over  Zm,  m  a  prime,  result¬ 
ing  from  the  previously  discussed  characteristic  Hecke  mod¬ 
ules.  Furthermore,  we  give  a  list  of  Hecke  modules  of  length 
n  —  6, . . . ,  16  over  Zg  with  a  good  minimum  squared  Euc¬ 
lidean  distance  calculated  by  an  exhaustive  computer  search. 
Finally  we  compare  the  resulting  codes  with  further  classes 
of  block  m-PSK  modulation  codes  such  as  cyclic  codes  over 
7Lm  and  multilevel  codes.  It  will  turn  out  that  at  least  for 
short  length,  the  minimum  squared  Euclidean  distance  of  our 
codes  is  as  good  as  the  one  of  the  best  unrestricted  modulation 
codes. 
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I.  Introduction  Reed-Solomon  codes  over  GF{pm),  p  a  groups  of  nonzero  elements  of  GF(pm)  when  represented 
prime  and  m  a  positive  integer,  are  cyclic,  Maximum  Dis-  by  their  companion  matrices  [4]  corresponding  to  each  ir- 
tance  Seperable  (MDS)  and  of  length  pm  -  1.  The  additive  reducible  polynomial  of  degree  m  coincides  with  a  maximal 
group  of  GF(pm)  is  elementary  abelian  of  type  (1, 1, . . . ,  1),  order  cyclic  subgroup  of  Aut(C™).  There  are  other  cyclic 
isomorphic  to  a  direct  product  of  m  cyclic  groups  of  order  subgroups  of  maximal  order  and  one  can  use  them  to  de- 
p,  denoted  by  C™.  This  paper  deals  with  MDS  codes  over  fine  transforms  which  are  counterparts  of  transforms  over 
C ™  of  length  prn  —  1  which  is  cyclic  and  MDS  is  called  a  finite  fields. 

Reed-Solomon  group  code.  In  general,  a  group  code  over  Definition  1:  For  any  C™,  let  T  denote  a  maximal 
C™  need  not  be  a  linear  code  over  GF(pm)  as  shown  in  order  cyclic  subgroup  of  Aut(C ™).  \f'  with  all  zero  matrix 
the  following  example.  consitute  an  elementary  abelian  group  isomorphic  to  C™ 

Example  1:  Consider  length  4,  code  over  C%  =  {l,x,y,xy}g,nd  considered  along  with  matrix  multiplication,  form  a 


consisting  of  the  following  16  codewords. 

(1,1, 1,1)  (1  ,x,xy,y)  (l,y,y,x)  (1  ,xy,x,xy) 

(x,l,xy,xy)  (x,  x,  1,  x)  ( x,y,x,y )  (x,  xy,  y,  1) 

(y,l,y,y)  (y,x,x,l)  (y,y,l,xy)  (y,xy,xy,x) 

(xy,  l,x,x)  (xy,x,y,xy)  (xy,  y,  xy,  1)  (xy,  xy,  1,  y) 

The  Hamming  distance  of  this  code  is  3  and  hence  this  is 
a  MDS  group  code.. 

In  [1],  it  is  shown  that  if  C  is  an  (n,  k,  n—k+ 1)  group  code 
over  an  abelian  group  G  that  is  not  elementary  abelian, 
then  there  exists  an  (n,  fc,n  -  k  +  1)  group  code  over  a 
smaller  elementary  group  G  .  In  view  of  these  results  a 
natural  question  that  arises  is  ’’Are  all  MDS  group  codes 
over  C™  linear  over  GF(prn)  as  well?”  Example  1  shows 
that  is  not  true,  in  general.  But,  if  one  considers  only 
cyclic  and  length  pm  -  1  group  codes  then  it  is  true.  In 
other  words,  all  Reed-Solomon  group  codes  over  C ™  are 
conventional  linear  codes  over  GF(prn).  This  can  be  shown 
by  extending  the  well  known  transform  approach  for  cyclic 
codes  over  finite  fields  [2]  to  group  codes  over  elementary 
abelian  groups. 

II.  Transform  approach  to  cyclic  codes  over  ele¬ 
mentary  abelian  groups:  Let  <7™,  denote  the  elemen¬ 
tary  abelian  group  isomorphic  to  direct  sum  of  m  cyclic 
groups  of  order  p  each.  The  ring  of  endomorphisms  of 
C™,  is  denoted  by  End(C ™).  The  set  of  automorphisms 
of  (7™,  denoted  by  Aut(C™),  form  a  group  whose  order 

is  -  1).  Among  the  cyclic  subgroups  of 

Aut(C™), the  maximal  order  subgroups  have  order  (pm-l). 
The  ring  End(C is  isomorphic  to  Mm(p),  the  ring  of 
mxm  matrices  over  GF(p)  [3].  This  isomorphism  gives  ma¬ 
trix  representation  for  elements  of  End(C™).  It  can  be  eas¬ 
ily  seen  that,  when  this  matrix  representation  is  used,  the 
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ring  calles  a  cononical  ring  of  C™ . 

For  example,  the  representation  of  a  finite  field  with  a 
canonical  matrix  and  its  powers  along  with  all  zero  matrix, 
clearly  gives  a  canonical  ring  of  C™. 

Definition  2:  Generalized  Discrete  Fourier  Transform 
( GDFT )):  Let  a  =  (a0,  ai, . . . ,  an_i),  where  ai  G  C7™,i  = 
0, 1, . . . ,  n  —  1,  and  n  =  pm  —  1.  The  transform  vector  of  a, 
denoted  by  A,  is  defined  by 

Aj  =  ®r=o =  0,1, ..  .,n  -  1, 

where  a  is  a  generator  of  a  cyclic  subgroup  of  Aut(C™)  of 
order  n,  and  (g>  denotes  group  operation  in  C™. 

When  C ™  is  made  GF(pTn)  by  imposing  a  multiplica¬ 
tion  structure  with  an  irreducible  polynomial  g(x)  then  all 
non  zero  elements  of  GF(pTn)  can  be  represented  by  the 
companion  matrix  of  g(x)  and  its  powers  and  a  in  Defini¬ 
tion  2  can  be  replaced  by  the  companion  matrix  of  g(x). 
Then,  Definition  2  coincides  on  the  conventional  DFT  over 
GF{pm),  of  length  pm  -  1. 

Using  the  GDFT  given  in  Definition  2  and  the  properties 
of  Aut(C and  its  matrix  representation  the  following  can 
be  proved. 

Theorem  1:  Every  cyclic  and  length  p™  —  1  MDS  group 
code  is  a  conventional  linear  code  over  GF(prn).  In  other 
words,  all  Reed-Solomon  group  codes  over  C™  are  conven¬ 
tional  linear  codes  over  GF(prn). 
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Abstract  —  We  propose  a  new  construction  of  non¬ 
linear  unequal  error  protection  (UEP)  block  codes 
whose  encoding  complexity  is  approximately  equiva¬ 
lent  to  the  decoding  complexity  of  a  linear  block  code. 
Some  classes  of  codes  that  are  better  than  any  linear 
UEP  codes  with  the  same  parameters  are  presented. 

I.  Introduction 

In  the  literature  studies  of  UEP  block  codes  were  mainly 
concentrated  on  linear  codes  because  of  easy  implementation 
of  encoding  and  decoding.  However,  there  are  nonlinear  UEP 
block  codes  that  are  better  than  any  linear  ones.  In  [1]  [2]  a 
construction  of  such  codes  were  presented  along  with  exam¬ 
ples,  which  is  based  on  the  idea  of  superimposing  codeword 
clouds  originally  introduced  by  Cover.  But  the  drawback  of 
the  construction  in  [1]  [2]  is  that  there  do  not  appear  to  be 
easily  implementation  methods  of  encoding.  We  propose  a 
new  construction  of  nonlinear  UEP  block  codes  whose  encod¬ 
ing  complexity  is  approximately  equivalent  to  the  decoding 
complexity  of  a  linear  block  code. 

II.  Description  of  Construction 

Here  for  simplicity  we  only  consider  two-level  UEP  codes. 
Let  C  be  an  (n,/ci  +  £2)  UEP  code  for  the  message  space 
Mi  x  M2,  where  Mi  =  GF(q)ki ,  for  i  =  1,2.  Each  message 
m  can  be  written  as  (mi,  m2),  where  mz  6  Mi,  for  i  —  1,2. 
Let  c(m)  denote  the  corresponding  codeword  in  C  for  the 
message  m.  The  error-correcting  capability  of  a  UEP  block 
code  is  described  by  its  separation  vector  s  =  (51,52)  defined 
by  Si  =  min{d(c(m),  c(m;))  :  m*  ^  m'},  for  i  —  1,2,  where 
d(a,  b)  denotes  the  Hamming  distance  between  a  and  b.  Let 
Ci,  C2,  and  C3  be  linear  codes  of  block  length  n  and  generator 
matrix  Ci,  C2,  and  G 3,  respectively.  Define  C23  to  be  the  code 
with  generator  matrix  [C^,C^]T.  The  important  message 
mi  is  encoded  to  a  codeword  ci  in  C\.  The  less  important 
message  m2  is  first  encoded  to  a  codeword  C2  in  C2.  The 
codeword  C2  is  then  decoded  by  using  a  complete  nearest- 
neighbor  decoder  of  C3  and  the  output  codeword  denoted  by 
03(02)  G  C3  is  produced.  The  codeword  b  which  carries  the 
less  important  message  m2  is  obtained  by  b  =  C2  —  03(02). 
The  final  transmitted  codeword  c  =  ci+b.  Clearly,  the  overall 
two- level  UEP  code  C  =  Ci  +  B,  where  B  is  the  set  of  all  b. 
Property  1:  If  all  the  rows  of  [G2  ,  Gj ]t  are  linear  independent, 
the  encoding  mapping  from  the  less  important  message  space 
M2  to  B  is  one-to-one. 

Let  w  represent  the  maximum  weight  of  codewords  b  G  B. 
Since  all  b  are  minimum- weight  coset  leaders  of  C3,  we  have 
w  <  p ,  where  p  is  the  covering  radius  of  C3  defined  by  p  — 
max  {min  {|y  —  c|  :  c  G  C3}  :  y  G  GF(q)n}.  Let  d\  denote  the 
minimum  distance  of  C\  and  c/23  be  the  minimum  distance  of 
the  code  C23. 

Property  2:  s\  >  d\  —  2w  >  d\  —  2p. 

Property  3:  If  d\  >  ^23  H-  2 w,  S2  >  0^23- 

Consider  two  lower  bounds  on  block  length  for  linear  UEP 
codes.  The  first  bound  is  a  generalization  of  the  well-known 


Singleton  bound:  n  >  Si  +  k\  +  —  1.  The  second  one  is 

a  generalization  of  the  Griesmer  bound:  n  >  [“Brl  + 


ltLiL~ki  +  i  •  Notations  ns  and  no  will 


be  used  to  repre¬ 


sent  these  two  lower  bounds. 

With  this  new  construction,  there  exist  codes  which  are  bet¬ 
ter  than  any  linear  ones.  For  example,  let  C\  be  a  repetition 
code  of  length  24  and  C23  be  a  (24, 23)  parity  check  code.  We 
can  choose  C2  to  be  a  (24, 12)  extended  Golay  code  because 
the  (24, 12)  extended  Golay  code  is  a  subcode  of  the  (24,23) 
parity  check  code.  The  covering  radius  of  the  (24, 12)  ex¬ 
tended  Golay  code  is  4.  Hence  this  construction  gives  h\  =  1, 

=  11,  Si  >  24  -  2  -  4  =  16,  and  S2  >  2.  The  bounds  give 
ns  —  ug  >  27.  However,  our  construction  only  has  n  —  24. 

Other  new  UEP  codes  can  be  constructed  from  BCH  codes 
and  Reed-Muller  codes.  Examples  of  these  codes  which  are 
better  than  any  linear  ones  with  the  same  parameters  are  given 
in  Tab.  1  and  Tab.  2. 


Tab.  1:  Examples  of  UEP  codes  constructed  from  BCH  codes  of 
length  2m  —  1  which  are  are  better  than  any  linear  codes.  (NC:  no 
coding,  SEC,  DEC,  TEC:  SEC,  DEC,  TEC  BCH  code.) _ 
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Tab.  2:  Examples  of  UEP  codes  constructed  from  Reed-Muller 
codes  of  length  2m  which  are  better  than  any  linear  codes.  (The 
entries  for  Ci,  C2,  and  C3  give  the  orders  of  the  Reed-Muller  codes.) 
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Abstract  —  This  paper  deals  with  the  construction  of 
a  class  of  binary  uniquely  decodable  code  pairs  (Ci,  C 2) 
for  the  two-user  binary  adder  channel  (2-BAC),  where 
Ci  is  a  linear  code.  The  generator  matrix  G  for  code 
Ci  has  the  property  that  any  of  its  columns  has  at 
most  a  single  1  among  its  k  elements.  These  codes 
are  called  strongly  orthogonal  codes  in  the  sense  that  the 
Hadamard  product  of  any  two  rows  of  G  is  the  all¬ 
zero  n-tuple.  The  proposed  2-BAC  codes  achieve  the 
upper  bound  for  the  sum  rate  when  the  rate  of  Ci 
is  greater  than  or  equal  to  1/2.  Block  and  bit  syn¬ 
chronization  is  assumed  between  the  users  and  the 
receiver. 


I.  INTRODUCTION 

A  code  pair  (Ci,C2)  is  called  a  linear  code  for  the  2-BAC 
if  either  Ci  or  C2  is  a  linear  code.  Without  loss  of  essential 
generality  we  shall  assume  henceforth  that  Ci  is  a  linear  code. 
Our  goal  is  to  start  from  Ci  and  to  construct  the  largest  code 
C2  such  that  (Ci,C2)  is  uniquely  decodable  in  the  2-BAC. 
Due  to  the  linearity  of  code  Ci  we  can  conveniently  make  use 
of  the  standard  array  decomposition  of  the  set  of  binary  n - 
tuples  into  cosets  of  C\.  The  codewords  of  code  C2  will  be 
chosen  from  the  cosets  of  Ci.  We  have  shown  [6]  that  the 
search  of  codewords  for  code  C2  in  one  coset  of  C 1,  say  u©Ci, 
can  be  performed  without  interfering  with  future  choices  of 
potential,  i.e.,  not  yet  chosen,  codewords  for  C2  contained  in 
other  cosets.  We  denote  by  Av®Ci  the  set  of  vectors  in  the 
coset  ‘uSC'i  which  are  codewords  of  C2.  We  have  also  shown 
in  [6]  that  it  is  possible  to  simplify  the  search  for  codewords 
for  C2,  within  a  given  coset,  by  decomposing  it  into  disjoint 
subsets  of  n-tuples.  The  decomposition  of  a  coset  is  neatly 
done  with  the  use  of  a  subspace  of  C\.  In  order  to  specify 
Av<$c  1  it  is  convenient  to  partition  v®Ci  into  disjoint  subsets 
and  we  thus  define  the  set 

Sv®Cl  -  {x3  e  Ci ;  a23*Oi©y2)  =  0,  for  some  *1  G  Ci}  C  Ci 

(1) 

where  y2  £  v®C\.  We  do  not  need  here  to  go  further  with 
this  theory  but  remark  that  the  objective  of  our  specific  code 
construction  in  this  paper  is  to  guarantee  that  Sv®Ci  is  always 
a  subspace  (of  dimension  l  <  k)  and  notice  that  in  general  this 
is  not  case.  We  therefore  introduce  next  a  class  of  linear  codes 
for  the  2-BAC  for  which  all  cosets  v@Ci  give  rise  to  sets  «Sv©Ci 
which  are  subspaces  easily  derivable  from  code  C 1. 

By  a  strongly  orthogonal  code  we  mean  a  binary  linear  code 
Ci  of  blocklength  n  and  dimension  k,  with  generator  matrix 
G  whose  rows  Ci,  i  =  1,  2, ...,  fc,  have  the  property  that 

d  Cj  =  (0,0,. ..,0), 

Vi,  j  =  1,2,  . .  .  ,fc,  with  i  ^  j. 

We  define  code  the  pairs  (Ci,  C2)  for  the  2-BAC  as  strongly 
orthogonal  codes  whenever  Ci  is  a  strongly  orthogonal  code. 
A  strongly  orthogonal  code  is  characterized  as  follows. 


II.  CODE  CONSTRUCTION 

Proposition  1:  Let  C\  be  a  binary  linear  code  of  blocklength 
n  and  dimension  k,  with  generator  matrix  G.  Code  C 1  is 
strongly  orthogonal  if  and  only  if  each  column  of  its  generator 
matrix  has  at  most  a  single  1  among  its  k  elements. 

Without  loss  of  essential  generality  in  the  sequel  we  con¬ 
sider  a  combinatorially  equivalent  form  of  G  =  [h  :  p],  where 
Jk  is  the  k  x  k  identity  matrix  and  g  is  a  k  X  (n  —  k)  matrix 
whose  ith  row  gi,  0  <  i  <  k  -  1,  has  a  string  of  U  consecu¬ 
tive  l’s  and  the  remaining  coordinates  are  filled  with  0’s.  If 
we  denote  by  Zfc+i  the  number  of  all- zero  columns  of  g  it  fol¬ 
lows  that  £^0  li  —  n-k.  The  following  theorem  establishes 
the  maximum  rate  R2,max  achievable  for  code  C2  under  the 
constraint  that  Ci  is  strongly  orthogonal. 

Proposition  2:  Let  Ci  be  a  strongly  orthogonal  code.  The 
maximum  rate  R2,max  for  a  code  C2  such  that  the  pair 
(CUC2)  is  uniquely  decodable  in  the  2-BAC  is  given  by 


R2, 


max  — 


i°g  (iA=0(2m  x  Nm )) +  lk+i 

k  4-  Eli 


(2) 


where  Nm  is  the  number  of  distinct  cosets  whose  leaders  v 
have  exactly  m  non-zero  blocks  out  of  the  k  blocks  {vi}i* 
This  number  Nm  is  given  by 


Nrr, 


fc  k  fc 

(2*h  —  l)x(2**2  —  l)x 

tl  =  l  i2  =  il+l  im=*Tn-l+l 


E  E 


where  U  is  the  blocklength  of^i,  l<i<k  +  1- 
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Abstract  —  Desmedt-Frankel  (June  1991)  presented 
an  erasure  code  in  which  the  entries  of  the  codewords 
belong  to  any  Abelian  group.  We  extend  this  work  to 
error-correction. 

I.  Introduction 

In  Reed- Solomon  codes  the  entries  of  the  generator  and  the 
parity  check  matrix,  the  message  and  the  codeword  vector  all 
belong  to  a  finite  field.  We  discuss  a  generalization  of  Reed- 
Solomon  in  which  the  entries  of  the  message  tuple  and  the 
codeword  tuple  belong  to  any  Abelian  group  A\  The  entries 
of  the  generator  matrix  G  and  the  parity  check  matrix  H  are 
similar  as  in  alternant  codes  but  belong  to  a  ring  A  such  that 
I\  is  an  A-module.  Clearly  R  is  not  necessarily  a  field. 

II.  Background 

Let  H  be  the  Vandermonde  matrix  [v/i,*],  where  Vh,i  = 
h  =  0  . . . ,  n  —  k  —  1,  i  =  0, . . . ,  n  —  1  and  a  ,-  £  R.  To  guarantee 
a  similar  distance  as  for  alternant  codes  each  (n~k)  x(n  —  k) 
submatrix  should  be  invertable,  which  if  R  is  commutative 
implies  that  for  all  z,  i'  ( i  ^  T):  a,  —  cv,/  are  units  in  R .  The 
following  R  has,  for  example,  been  chosen  [l]1:  R  =  Z[u\  = 
Z[x]/((xq  -  l)/(s  -  1)),  where  q  is  a  prime  larger  than  n  —  1. 
Choosing  a0  =  0  and  the  other  at  =  U'J  satisfies  the 

requirements.  Now,  K  needs  to  be  replaced  by  an  expanded 
Abelian  group  Ii '  =  Z[u\  Z)z  K,  where  ®  indicates  the  ten¬ 
sor  product  of  modules  (no  knowledge  of  tensor  products  of 
modules  is  required  to  understand  the  essence  of  this  text). 

A1  is  a  ^[w]-module.  So  the  entries  of  c  and  a  belong  to  AT 
Clearly  any  k  £  K  maps  easily  into  a  k'  £  AT  This  code  (to 
be  more  precise  its  dual)  was  studied  in  [l]  (see  also  [3])  as  an 
erasure  code.  The  purpose  of  this  paper  is  to  study  this  code 
as  an  error-correcting  code. 

III.  Decoding 

Let  A  *  be  the  A-module  where  A  is  a  commutative  ring.  As 
for  extended  BCH  codes,  there  exist  the  following  equations 
between  the  syndromes: 

V 

A =  0,  where  (1) 

i- o 

v 

A(x)  =  A0  +  Aix  +  ...  +  A.*"  =  J|(l  -xai,)  (2) 

l-l 

and  j  =  0, n  —  1  —  k— v,  i\  is  an  error  location  (0  <  q  <  n— 1), 
for  all  i  and  i  (z  ^  z'):  a,  —  Q  ■/  is  a.  unit  and  the  syndrome 

*This  research  has  been  partially  supported  by  NSF  Grant 
NCR-9106327. 

1 A  similar  ring  was  used  later  on  in  [2],  but  they  worked  modulo 
a  prime  p,  while  no  limitation  on  K  is  set  here. 


53201,  U.S.A. 

bj  £  A  .  Since  the  syndromes  no  longer  belong  to  a  ring,  the 
Peterson- Gorenstein-Zierler  decoder  cannot  be  used.  Indeed 
on  K'  only  an  addition  is  defined  and  no  internal  multiplica¬ 
tion.  This  implies  that  the  standard  technique  to  prove  that 
if  (1)  is  satisfied,  then  there  are  at  maximum  v  errors  in  the 
received  word,  can  no  longer  be  used.  Fortunately,  one  can 
still  prove  (details  skipped)  that  if  v  errors  have  occurred  then 
for  all  v'  <  v  some  0jtVt  jz  0  for  0  <  j  <  v  ~  v*  -  1.  Let  us 
discuss  decoding  of  this  code  in  more  details. 

The  obvious  decoder  for  alternant,  codes  is  the  Berlekamp- 
Massey  algorithm.  However,  the  syndromes  are  no  longer  in 
a  finite  held,  but  in  a  module.  So  it  seems  that  we  need  to  ex¬ 
tend  this  algorithm.  Extensions  have  been  presented,  e.g.  [4]. 
Unfortunately  it  is  not  too  difficult  to  prove  that  if  one  could 
extend  Berlekamp-Massej^s  algorithm  to  our  scenario,  then 
discrete  logarithm  modulo  p  and  factoring  integers  would  be 
easy.  (Both  problems  are  assumed  to  be  hard.)  Let  us  explain 
this.  Given  any  sequence  (so,  $i Sn_i_*)  of  elements  of 
a  finite  held,  Berlekamp- Massey  finds  the  smallest  v  and  A, 
such  that  (1)  is  satisfied.  Now  we  allow  any  A-module,  and 
st  belong  to  the  A- module  and  A,  £  A.  Now  take  the  Zp- 1- 
module  Zp(*),  p  a  prime,  and  define  the  scalar  operation  a  •  x 
as  xa  mod  p,  where  a  £  Zp- 1(+,*)  and  x  £  £®(*).  Take 
v  =  n—  1  —  k  =  1,  then  if  Berlekamp-Massey  could  be  extended 
to  any  module,  it  would  find  Ai,  where  —  Aj  is  the  discrete 
log  of  si  in  base  so  (if  it  exists),  which  is  believed  to  be  hard. 
Worse,  replacing  the  ring  Zv„\  by  Z^m^  and  the  Abelian 
group  Zp{ *)  b)?  Zm( *)  implies  that  if  Berlekamp-Massey  could 
be  extended  for  this  module,  factoring  also  would  be  easy. 
This  discussion  easily  extends  to  the  expanded  AT  So  under 
the  assumption  that  discrete  log  is  hard,  Berlekamp-Massey 
cannot  be  extended  to  be  used  to  decode  this  code.  However, 
when  v  is  small  an  exhaustive  search  will  allow  one  to  eas¬ 
ily  find  the  error  locations!  So  far,  we  have  not  been  able  to 
develop  a  decoder  when  v  is  large. 

We  conclude  by  saying  that  Berlekamp  and  Massey  were 
lucky  that  BCH  codes  were  studied  over  finite  fields. 
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Abstract  —  We  describe  here  the  one  subclass  of 
quasi-cyclic  Goppa  codes  with  Gopa  polinomial  G(x)  = 
xl  -  1. 


I.  Introduction 

It  is  well  known  that  Goppa  codes  include  as  cyclic  codes  only 
BCH  codes  (with  Goppa  polinomial  G(z)  =  x*)[l]  and  double 
error-correcting  cyclic  codes  (extended  double  error-correcting 
Goppa  codes)  [2].  Here  we  will  discuss  a  subclass  of  binary 
Goppa  codes  with  Goppa  polinomial  G(x)  =  ac4  —  1  and  loca¬ 
tion  set  L  =  {aj  •  a1'1},  j  =  l..p,  i  =  0 ..t  —  1,  p  <  l, a—  is  a 
primitive  element  of  GF( 2m),  l  •  t  =  2m  —  1  and  G(atj)  ^  0. 
It  is  easy  to  show  that  such  Goppa  codes  are  quasi-cyclic. 


II.  Quasi-cyclic  Goppa  codes 

In  this  paper, as  an  example,  we  would  like  to  discuss  some 
codes  from  special  subclass  of  quasi-cyclic  Goppa  codes  with 
following  type  of  generator  matrix: 


|ci,l| 

|ci,2| 

|ci,p-l| 

|0| 

|C2.l| 

lC2,2  | 

|0| 

lc2,P 

Kil 

[ui,2  | 

"*  \vi,p-i\ 

l^l.P 

Kil 

1^2,2 1 

\v2,p~l\ 

K  p 

\v1, 2  1 

1  V9.P-1  1 

\V<!,P 

where  \cij\-  generator  submatrix  of  cyclic  code  with  lenght  m 
and  generator  polinomial  aj(x)>  |0|-  zero  submatrix,  |vy|  - 
all-zero  or  all-one  vector. 

1.  (55,16,19)-Goppa  code  [3,4].  G(x)  =  x9  -  a54  and  location 
set  L  =  {aj  ■  a1'7},  j  =  1..6,  i  =  0..8, 

Oi\  =  1,  a 2  =  a,  as  =  a2,a4  =  a3, 0:5  =  &4,a 6  =  a5 , 0.1  = 
0,  a—  is  a  primitive  element  of  GF( 26)  . 
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where,  for  example,  |451s  |  corresponds  to  [100101001]  -  gen¬ 
erator  submatrix  of  (9,6,2)-cyclic  code  with  generator  poli¬ 
nomial  g(x)  —  (x3  +  1)  •  (z5  +  1).  ^From  this  code  it 
is  easy  to  construct  two  different  (with  different  weight  dis¬ 
tribution)  (46,9,19) [5]  quasi-cyclic  codes  ,(46,11,17)  [5]  and 
(28,9,10)  quasi-cyclic  code. 

2.  (103,20,35)-Goppa  code.  G(x)  —  x17  -  1  and  location 
set  L  =  {aj  ■  a1’15},  j  —  1..7,  i  =  0..17, 

O  E  O  Q  10  12 

ai  =  a  ,a2  =  a  ,a6  =  a  a4  =  a  ,«5  =  a  ,a6  =  a  ,a7  = 
0,  a—  is  a  primitive  element  of  GF{ 2s)  . 
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111  _ 

The  generator  polinomials  are  given  here  in  octal,  i.e. 
|011251s|  coresponds  to  |00001001010101001|  -  generator 

submatrix  of  (17,8,6)-cyclic  code  with  generator  polinomial 

g(x)  =  (a?8  +  x4)  ♦  ( x 8  +  x5  4-  x4  -f-  x3  T  1). 

^From  this  code  it  is  easy  to  construct  best  known  (102,16,40) 
quasi-cyclic  code  with  generator  matrix 

r  |342021|  | 011251 |  |331364|  |074202|  |143377|  |000000j  ] 

[  |332533!  |364213|  |016316|  |264774l  |000000|  1143377|  J 

This  code  improves  the  lower  bounds  on  the  maximum  mini¬ 
mum  distance  for  (102,16),  (101,16),  (100,16)  and  (99,16)  bi¬ 
nary  linear  codes [6]. 

3.  (136,20,52)-Goppa  code.  G(x)  =  x1T  -  1  and  location  set 
L  =  {aj  ■  a’15},  j  =  1..8,  i  =  0..17, 

ai  =  a1, 02  =  a2, a 3  =  a4  =  a',  as  =  a8,ae  =  a1  ,07  = 
a13,  as  =  a14,  a—  is  a  primitive  element  of  GF( 28).  From 
this  code  it  is  easy  to  construct  (119,11,52)  quasi-cyclic  code. 
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Abstract  -  Linear  codes  with  parameters  [47,  5,  30;  3],  structed  by  the  method  from 
[44,  6,  2  ( ;  3],  [90,  6,  57;  3]  and  [94,  6,  60;  3]  have  been  found. 


[1].  The  generator  matrices  are 


I.  Introduction 

Let  GF(q)  denote  the  Galois  held  of  q  elements.  A  linear  code 
of  length  n,  dimension  k,  and  minimum  Hamming  distance  d, 
over  GF(q)  is  called  an  [n,  k,  d;  ?]-code.  Let  nq(h,d)  denote 
the  minimum  n  for  which  an  [ti,  k,  d\  q]  code  exists, 
lor  linear  codes  over  GF(q)  with  q  >  2,  there  is  a  natural 
generalization  of  the  class  of  constacyclic  codes  to  the  class 
of  cyclic  codes  [2].  A  constacyclic  (cv  -  twisted)  code  has  the 
following  property:  For  some  fixed  element  cv  of  GF(q ),  if 
(«0.  ai. . . . ,  fln-i)  is  a  codeword  then  (aan_i ,  a0 ,  c*i ,  . .  . ,  an_2) 
is  a  codeword  too.  The  theory  of  constacyclic  codes  is  very 
similar  to  that  of  cyclic  codes. 

The  algebra  of  twistulant  m  x  m  matrices  over  GF(q)  is  iso¬ 
morphic  to  the  algebra  of  polynomials  in  the  ring  F\x]j(xrn  — 
a).  The  [pm,k]~  codes  C  with  generator  matrices  of  type: 
[B\,  B2,  ....  Bp]  where  each  Bt  is  a  twistulant  matrix  are 
called  quasi-twisted  [4]. 

Let  ci(x),  C2(x),  ....  cp(x)  be  the  polynomials  corresponding  to 
twistulant  m  x  rn  matrices  Bi)B2,...1Bp  and  h(x)  =  (xm  - 
a)/gcd{x  —  a,  c\ (x),  c2(x), . .  . ,  cp(x)).  Then  the  dimension  A: 
of  C  is  equal  to  the  degree  of  h(x).  Two  polynomials,  Cj(x) 
and  c,(x),  belong  to  the  same  class  if  Cj(x)  =  axlct(x)  mod 
(xm  -  o),  for  some  integer,  /  >  0.  Two  twistulant  matrices, 
Bi  and  BJ:  are  called  conjugates  if  ct(x)  and  cj(x)  belong  to 
the  same  class. 

Good  quasi-twisted  codes  are  obtained  if  there  are  no  conju¬ 
gates  in  the  generator  matrix. 

II.  Results 

Lemma  1 . [3]  47  <  7i3(5, 30)  <  48,  44  <  n3(6,27)  <  45, 

89  <  u3 (6,  57)  <  91,  93  <  7*3(6,  60)  <  96  . 

Theorem  1. 

(i)  89  <  7*3(6.  57)  <  90; 

(ii)  n3(o.  30)  =  47,  7*3(6,  27)  =  44,  93  <  7*3(6,  60)  <  94 
Proof: 

(i)  The  [90,  5,  57;  3]  codes  were  constructed  as  quasi- twisted 
with  a  rate  1/p  and  (777  —  4) /pm.  The  generator  polynomials 
are: 

110000,  000121,  000122,  001002,  001022,  001101,  001211,  010122, 
010212,  011011,  011021,  011112,  011122,  011212,  111111; 

1 021210000.  1210111000,  2111211000,  2021101000,  1202001100, 
1022111100,  1112221100,  1211012100,  1210201110. 

(ii)  (  'odes  with  parameters  [47,  5,  29;  3],  [44,  6,  27;  3]  are  con- 


/  00000000000000000 11111111111111111 1 1 1 1 1 1 1 1  mu 
0000000111111111100000000001111111 1112222222222 
G 1  =  00111110000011112001111222200001112220000111222 

11000111112200221110012001201120120011112001002 
\  01012010121201010010122122110102001210120020122 


000000000000001 1 11 1 1 n 1 1 1 1 1 1 1 1 m 1 1 1 1 j  L 1 n  1 1 
000001 1111111 10000000001 111111111 11222222222 
00111001122222011112222001111122222011112222 
00012120101122201220112120112201122101220112 
01002202110102020021022121020120102012022011 
10010100011220000212102201021221001112100212 

The  weight  distributions  are: 

[47, 5, 30;  3]  -  A0  =  1,  Abo  =  166,  A33  =  46,  A3@  =  20, 

A39  =  8,  A42  =  2, 

[44,  6,  27;  3]  -  A0  =  1,  A27  =  352,  A30  =  264,  A33  =  24  Aa6  = 

88. 

A  [94,  6,  60:  3]-code  was  also  constructed  by  the  method  from 

[1],  and  has  a  weght  distribution  A0  —  1,  A60  =  456,  A6?t  = 

76,  ^69  =  192,  A72  —  4. 
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